Metadata-Version: 2.4
Name: primer-target-planner
Version: 0.1.0
Summary: Interval-based PCR target window planner
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"

# primer-target-planner

Interval-based PCR target window planner.

Given a set of required genomic intervals (e.g. CDS exons) and PCR product size
constraints, generate the minimal set of target windows that fully cover every
required interval.

This is a **pure algorithm library** — no primer design, no external
bioinformatics services.

> **This library uses 0-based half-open intervals: `[start, end)`, `length = end - start`.**
>
> All coordinates in `RequiredInterval`, `PlanningBounds`, and `TargetWindow`
> follow this convention.  For example, `RequiredInterval("exon1", 1000, 1200)`
> covers positions 1000–1199 inclusive (200 bp).

## Install

```bash
pip install -e ".[dev]"
```

## Quick start

```python
from primer_target_planner import (
    plan_targets,
    PlannerConfig,
    PlanningBounds,
    RequiredInterval,
)

intervals = [
    RequiredInterval("exon1", 1000, 1200),  # 200 bp
    RequiredInterval("exon2", 1500, 1800),  # 300 bp
    RequiredInterval("exon3", 2200, 2500),  # 300 bp
]

config = PlannerConfig(
    product_min=600,
    product_max=1000,
    strand="+",
)

targets = plan_targets(intervals, config)
for t in targets:
    print(
        f"[{t.start}, {t.end})  len={t.length}  mode={t.planning_mode}  "
        f"covers={t.covered_ids}  reason={t.reason}"
    )
```

## Negative-strand example

On the **negative strand** the planner processes intervals from high genomic
coordinates (transcript 5') to low coordinates (transcript 3').  Input and
output coordinates are always genomic `start < end` — the strand only affects
**planning order**, not coordinate direction.

```python
from primer_target_planner import (
    plan_targets,
    PlannerConfig,
    RequiredInterval,
)

# Four exons on the minus strand.
# Transcript order (5'→3'): exonD → exonC → exonB → exonA
# (high genomic coords → low genomic coords)
intervals = [
    RequiredInterval("exonA", 300, 400),
    RequiredInterval("exonB", 700, 800),
    RequiredInterval("exonC", 1100, 1200),
    RequiredInterval("exonD", 1500, 1600),
]

config = PlannerConfig(product_min=500, product_max=900, strand="-")
targets = plan_targets(intervals, config)

for t in targets:
    # start < end always — genomic coordinates, not transcript direction
    print(
        f"[{t.start}, {t.end})  len={t.length}  mode={t.planning_mode}  "
        f"covers={t.covered_ids}"
    )
# Possible output:
# [701, 1600)  len=899  mode=product_max  covers=['exonD', 'exonC']
# [300, 800)   len=500  mode=product_max  covers=['exonB', 'exonA']
```

## API

### `RequiredInterval`

| Field      | Type              | Description                              |
|------------|-------------------|------------------------------------------|
| `id`       | `str`             | Identifier (e.g. exon name)              |
| `start`    | `int`             | Genomic start (0-based, inclusive)        |
| `end`      | `int`             | Genomic end (exclusive)                   |
| `metadata` | `dict \| None`    | Optional user metadata                   |

All coordinates are 0-based half-open `[start, end)`.
`length = end - start`.

### `PlannerConfig`

| Field          | Type           | Default | Description                           |
|----------------|----------------|---------|---------------------------------------|
| `product_min`  | `int`          | —       | Minimum PCR product length (bp)       |
| `product_max`  | `int`          | —       | Maximum PCR product length (bp)       |
| `strand`       | `"+" \| "-"`   | —       | Transcript strand                     |
| `tile_overlap` | `int`          | `200`   | Overlap between tiles for long spans  |
| `allow_overlap`| `bool`         | `True`  | Allow adjacent targets to overlap     |

### `PlanningBounds`

| Field   | Type  | Description                                      |
|---------|-------|--------------------------------------------------|
| `start` | `int` | Gene / transcript genomic start (inclusive)       |
| `end`   | `int` | Gene / transcript genomic end (exclusive)         |

0-based half-open `[start, end)`.  `length = end - start`.

### `TargetWindow`

| Field          | Type        | Description                               |
|----------------|-------------|-------------------------------------------|
| `start`        | `int`       | Genomic start (inclusive)                 |
| `end`          | `int`       | Genomic end (exclusive)                   |
| `length`       | `int`       | `end - start`                             |
| `covered_ids`  | `list[str]` | IDs of fully covered intervals            |
| `anchor_id`    | `str`       | The interval that anchored this target    |
| `anchor_side`  | `"5prime" \| "3prime"` | Anchor side                |
| `planning_mode`| `str`       | `product_min`, `product_max`, `single`, `terminal_reverse`, `tiled` |
| `reason`       | `str`       | Human-readable explanation                |

### `plan_targets(intervals, config, bounds=None) -> list[TargetWindow]`

Main entry point.

- `intervals`: required intervals (any order; sorted internally).
- `config`: product-size and strand configuration.
- `bounds`: optional gene extent; enables terminal-reverse logic.

Returns target windows in transcript 5'→3' order.

## Algorithm

### Min-first / max-rescue planner

Processing proceeds from the transcript 5' end:

1. **Try `product_min`** — if a window of `product_min` bp can fully cover the
   next consecutive required interval, merge it. Continue merging while
   `product_min` still covers the next interval.

2. **Try `product_max`** — if `product_min` cannot cover the next interval but
   `product_max` can, use `product_max` and merge all intervals it covers.

3. **Independent target** — if neither size covers the next interval, the
   current anchor becomes its own target and the next interval starts a new
   anchor.

4. **Terminal reverse** — if a forward window from the current anchor would
   extend past the gene 3' boundary, instead anchor at the gene 3' end and
   extend toward 5'. Tries `product_min` first; upgrades to `product_max` if
   the previous interval can also be covered.
   Window: `[gene_end - product_size, gene_end)`.

5. **Tiling** — when a single required interval exceeds `product_max`, it is
   automatically tiled into overlapping windows of `product_max` bp with
   `tile_overlap` bp overlap.

### Coverage rule

A required interval is considered **fully covered** only when:

```
target.start <= interval.start  AND  target.end >= interval.end
```

**Partial coverage does not count.**  A target that overlaps an interval but
does not span its full extent does not mark that interval as covered.

### Bounds behaviour

| `bounds` provided? | Behaviour |
|---|---|
| **No** (`None`) | Gene extent is inferred from the intervals themselves. Windows may extend freely beyond the inferred span. Terminal reverse is **not** triggered (there is no external 3' boundary to respect). |
| **Yes** | The planner keeps all windows within `[bounds.start, bounds.end)`. When a forward window would extend past the 3' boundary, **terminal reverse** anchors at `bounds.end` and extends toward 5'. |

Providing bounds is recommended when you know the gene / transcript extent — it
prevents targets from stretching beyond the biological region and enables the
terminal-reverse optimisation at the 3' end.

### Strand handling

- **"+" strand**: 5' is at low genomic coordinates; intervals are processed in
  ascending genomic order.
- **"-" strand**: 5' is at high genomic coordinates; intervals are processed in
  descending genomic order.
- **All output coordinates are genomic `start < end`.**
  The strand only affects planning order, never coordinate direction.

## Running tests

```bash
python -m pytest -q
```
