Metadata-Version: 2.1
Name: ask-ecdna
Version: 0.1.1
Summary: AmpliconSeeK: a Python toolkit for detecting amplified genomic structures and candidate extrachromosomal DNA from sequencing data
Home-page: https://github.com/nanawei11/AmpliconSeeK/
Author: Nana Wei
Author-email: nanawei11@163.com
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: pysam
Requires-Dist: matplotlib
Requires-Dist: scipy
Requires-Dist: statsmodels
Requires-Dist: seaborn
Requires-Dist: scikit-learn


# AmpliconSeeK (ASK)

AmpliconSeeK (ASK) is a Python toolkit for detecting and reconstructing amplified genomic structures and candidate extrachromosomal DNA (ecDNA) from indexed alignment files, supporting both de novo discovery and targeted search of known ecDNA structures.

**Current version:** `0.1.1`


## Table of contents

- [Overview](#overview)
- [Software dependencies](#software-dependencies)
- [Installation](#installation)
  - [How to install Python and required packages](#how-to-install-python-and-required-packages)
- [Input data preparation](#input-data-preparation)
  - [Required input data](#required-input-data)
  - [BAM](#bam)
  - [Reference annotation data](#reference-annotation-data)
  - [Known ecDNA structure for search](#known-ecdna-structure-for-search)
- [De novo ecDNA detection](#de-novo-ecdna-detection)
  - [How to run from BAM file](#how-to-run-from-bam-file)
  - [Segmentation mode](#segmentation-mode)
  - [Output](#output)
  - [How to prepare BAM file](#how-to-prepare-bam-file)
- [Targeted ecDNA search](#targeted-ecdna-search)
  - [What search mode does](#what-search-mode-does)
  - [Parameters](#parameters)
  - [Output](#output-1)
- [Output files](#output-files)
- [File formats](#file-formats)
  - [Amplicon tables](#amplicon-tables)
  - [Copy number tables](#copy-number-tables)
  - [Breakpoint tables](#breakpoint-tables)
  - [Single-cell matrix files](#single-cell-matrix-files)
  - [Run summary](#run-summary)
  - [JCS file](#jcs-file)
- [Algorithm overview](#algorithm-overview)
- [Checkpointing and modular usage](#checkpointing-and-modular-usage)
- [License](#license)
- [Contact](#contact)

## Overview

Extrachromosomal DNA (ecDNA) is a dynamic form of oncogene amplification that contributes to cancer progression through high-copy gene dosage, regulatory rewiring, and cell-to-cell heterogeneity. AmpliconSeeK (ASK) is a computational framework for identifying ecDNA-associated amplicon structures from diverse high-throughput sequencing data, including WGS, WES, ChIP-seq, MNase-seq, ATAC-seq, scATAC-seq, and target-capture sequencing. ASK integrates copy-number signal from genomic bin counts with breakpoint-level evidence, including soft-clipped reads, split reads, supplementary alignments, breakpoint pairs, and junction sequences, to infer amplified segments and reconstruct candidate circular or linear amplicons. Candidate structures are annotated with genes, cancer genes, and super-enhancers and visualized with ASK-style amplicon plots.

ASK provides two main workflows:

| Workflow          | Command      | Description                                                                                             |
| ----------------- | ------------ | ------------------------------------------------------------------------------------------------------- |
| De novo detection | `ask`        | Detect amplified segments, breakpoint pairs, and candidate circular amplicons directly from a BAM file. |
| Targeted search   | `ask-search` | Search a new BAM file for evidence supporting a known ecDNA structure.                                  |

ASK can be applied to sequencing assays with genomic alignment signals, including WGS, WES, ChIP-seq, MNase-seq, ATAC-seq, scATAC-seq, and target-capture sequencing.

## Software dependencies

* ASK has been tested on macOS and Linux.
* ASK uses indexed alignment files and standard Python packages.
* Required Python packages include pysam, pandas, numpy, statsmodels, matplotlib, seaborn, scipy, and scikit-learn.

## Installation

### How to install Python and required packages

Install Miniconda by following https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html

```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
```

Set up bioconda channels:

```bash
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
```

Create an environment with the required Python packages:

```bash
conda create -n ask --no-channel-priority pysam pandas numpy matplotlib statsmodels seaborn scipy scikit-learn
```

Activate environment and install ASK:

```bash
conda activate ask
pip install ask-ecdna
```

Now, you are ready to run ASK:

```bash
ask --help
ask-search --help
```

## Input data preparation

### Required input data

ASK requires the following input data:

| Data Type                 | Required for `ask` | Required for `ask-search` | Description                                             |
| ------------------------- | ------------------ | ------------------------- | ------------------------------------------------------- |
| BAM                       | Yes                | Yes                       | Sorted and indexed alignment file                       |
| BAM index                 | Yes                | Yes                       | `.bai` index file                                       |
| Genome annotation         | Recommended        | Recommended               | Gene annotation BED12 file                              |
| Cancer gene list          | Optional           | Optional                  | Cancer gene census file                                 |
| Super-enhancer annotation | Optional           | Optional                  | BED file for SE annotation                              |
| Known ecDNA structure     | No                 | Yes                       | ASK circular table or manually prepared known structure |

### BAM

The input alignment file should be sorted and indexed:

```text
sample.bam
sample.bam.bai
```

For de novo detection, duplicate marking is recommended before running ASK.

### Reference annotation data

ASK includes commonly used annotation files under `data/`. For example, with `--genome hg38`, ASK expects files such as:

```text
data/hg38_refgene_process.bed12
data/se_hg38_sort.bed
data/Census_all_20200624_14_22_39.tsv
```

Custom annotation files can be provided manually:

```bash
--genefile /path/to/gene.bed12
--sefile /path/to/super_enhancer.bed
--cgfile /path/to/cancer_gene.tsv
```

The genome build used by the BAM file and annotation files should match.

### Known ecDNA structure for search

`ask-search` accepts an ASK circular amplicon table:

```text
*_ask_amplicon_circular.tsv
```

It also accepts a manually prepared known-structure table. At minimum, the table should contain:

| AmpliconID | Chrom |    Start |      End |
| ---------- | ----- | -------: | -------: |
| circ_0     | chr7  | 54830975 | 56117062 |

If segment-level order and strand are available, include them:

| AmpliconID | Chrom |    Start |      End | Strand |
| ---------- | ----- | -------: | -------: | ------ |
| circ_0     | chr7  | 54830975 | 55200000 | +      |
| circ_0     | chr7  | 55500000 | 56117062 | +      |

ASK uses the known structure to derive reference breakpoint pairs for targeted search.

ASK output filenames follow this convention:

```text
{outprefix}_ask_{result_name}.tsv
```

For example:

```text
sample_ask_amplicon_circular.tsv
sample_ask_breakpoint_pair.tsv
sample_ask_bin_count_norm.tsv
```

## De novo ecDNA detection

### How to run from BAM file

Run the example BAM file included in this repository:

```bash
cd /path/to/AmpliconSeeK

ask \
  -i exampledata/testdata.bam \
  -o exampledata/testdata/samplename \
  -g hg38 \
  --subseg \
  --juncread 5 \
  --SA_with_nm
```

### Output

The command generates ASK-style output:

```text
testdata/
├──  samplename_ask_amplicon_circular.tsv
├──  samplename_ask_amplicon_circular_stat.tsv
├──  samplename_ask_amplicon_linear.tsv
├──  samplename_ask_amplified_segment.tsv
├──  samplename_ask_bin_count.tsv
├──  samplename_ask_bin_count_norm.tsv
├──  samplename_ask_breakpoint.tsv
├──  samplename_ask_breakpoint_pair.tsv
├──  samplename_ask_breakpoint_pair_raw.tsv
├──  samplename_ask_breakpoint_seg.tsv
├──  samplename_ask_clip_count.bedgraph
├──  samplename_ask_cn_segmentation.tsv
├──  samplename_ask_sc_support_matrix.tsv
├──  samplename_ask_sc_normal_alignment_matrix.tsv
├──  samplename_ask_junctionseq
│   ├──  circ_0.tsv
│   ├──  circ_1.tsv
│   ├──  circ_2.tsv
│   └──  circ_3.tsv
├──  samplename_ask_plot
│   ├──  ampseg_0.pdf
│   ├──  circular_circ_0.pdf
│   ├──  circular_circ_1.pdf
│   ├──  circular_circ_2.pdf
│   └──  circular_circ_3.pdf
├──  samplename_ask_stats.tsv
├──  samplename_ask_step1.pdat
├──  samplename_ask_step2.pdat
├──  samplename_ask_step3.pdat
└──  samplename_ask_step4.pdat


```

The single-cell matrix files are generated only when real cell barcodes are detected in breakpoint-supporting reads.

### Segmentation mode

ASK uses `-d/--segmode` to specify the input data type for copy-number segmentation; the default is `standard` for whole-genome-like data, while `bias` is recommended for coverage-biased assays such as ATAC-seq, scATAC-seq, ChIP-seq, WES, MNase-seq, and target-capture sequencing.

| Mode       | Recommended data types                                       | Description                                                  |
| ---------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| `standard` | WGS and low-bias whole-genome-like data, including ChIP-input/input control data | Uses raw read counts in genomic bins. This is the default mode. |
| `bias`     | Coverage-biased assays such as ATAC-seq, scATAC-seq, ChIP-seq, WES, MNase-seq, and target-capture sequencing | Uses sub-bin robust statistics and bias correction to reduce local coverage bias. |

For WGS or ChIP-input data, the default mode is usually sufficient:

```bash
ask -i sample.bam -o sample_ask/sample -g hg38 -d standard
```

For ATAC-seq, scATAC-seq, ChIP-seq, WES, MNase-seq, or target-capture data, use:

```bash
ask -i sample.bam -o sample_ask/sample -g hg38 -d bias
```

### How to prepare BAM file

Map FASTQ files to the genome:

```bash
# paired end
bwa_index=/path/to/hg38.fa
bwa mem -t 5 ${bwa_index} test_R1.fastq.gz test_R2.fastq.gz | samtools view -Shb - > test_unsorted.bam

# single end
bwa mem -t 5 ${bwa_index} test.fastq.gz | samtools view -Shb - > test_unsorted.bam
```

Sort and mark duplicates:

```bash
samtools fixmate --threads 5 -m test_unsorted.bam - \
    | samtools sort --threads 5 -T ./ - \
    | samtools markdup --threads 5 -T ./ -S -s - test.bam
```

Make index:

```bash
samtools index test.bam
```

## Targeted ecDNA search

Use a known ecDNA structure and a new BAM. For the example data, first run the `ask` command above, then use its circular amplicon table as the known structure:

```bash
ask-search \
  --circular query_sample=exampledata/testdata/samplename_ask_amplicon_circular.tsv \
  --bam exampledata/testdata.bam \
  --genome hg38 \
  --min-junc-cnt 5 \
  -o exampledata/testdata_search/testdata_search
```

If running directly from the source tree:

```bash
python ask/ecDNA_search.py \
  --circular query_sample=exampledata/testdata/samplename_ask_amplicon_circular.tsv \
  --bam exampledata/testdata.bam \
  --genome hg38 \
  --min-junc-cnt 5 \
  -o exampledata/testdata_search/testdata_search
```

### What search mode does

`ask-search` is a targeted workflow:

1. Parse the known ecDNA structure.
2. Derive reference breakpoint pairs from the known segments.
3. Collect reads around relevant chromosomes and breakpoint neighborhoods.
4. Match observed breakpoint-pair evidence to the reference breakpoint pairs.
5. Reconstruct supported circular structures from the observed evidence.
6. Report ASK-style outputs and Junction Concordance Score.

### Parameters


| Parameter           | Required | Default             | Description                                                               |
| ------------------- | -------- | ------------------- | ------------------------------------------------------------------------- |
| `--circular`        | Yes      | -                   | Known ecDNA structure in`sample_id=known_ecDNA.tsv` format                |
| `--bam`             | Yes      | -                   | Query BAM file                                                            |
| `-o`, `--outdir`    | Yes      | -                   | Output directory                                                          |
| `--outprefix`       | No       | `outdir/<bam-stem>` | ASK-style output prefix                                                   |
| `--genome`          | No       | `hg38`              | Genome build for default annotation files                                 |
| `--target-genes`    | No       | None                | Optional comma-separated cancer genes used to filter reference structures |
| `--window`          | No       | `200`               | Breakpoint-neighborhood search window in bp                               |
| `--mapq`            | No       | `20`                | Minimum mapping quality                                                   |
| `--nmmax`           | No       | `1`                 | Maximum NM mismatch count                                                 |
| `--min-junc-cnt`    | No       | `1`                 | Minimum junction read count used before DFS circular reconstruction       |
| `--bpp-min-dist`    | No       | `50`                | Minimum same-chromosome breakpoint-pair distance in bp                    |
| `--jcs-min-support` | No       | `5`                 | Minimum supporting reads required to validate one reference junction      |
| `--min-jcs`         | No       | `0.5`               | Circle-level JCS detection threshold                                      |

### Output

The command generates ASK-style search output:

```text
ask_search/
├── known_breakpoint_seed.tsv
├── known_ecDNA_breakpoint_pairs.tsv
├── known_ecDNA_segments.tsv
├── sample_search_ask_alignment_sequence.tsv
├── sample_search_ask_amplicon_circular_new.tsv
├── sample_search_ask_amplicon_circular_stat_new.tsv
├── sample_search_ask_amplicon_linear.tsv
├── sample_search_ask_amplified_segment.tsv
├── sample_search_ask_bin_count.tsv
├── sample_search_ask_bin_count_norm.tsv
├── sample_search_ask_breakpoint_pair.tsv
├── sample_search_ask_breakpoint_pair_raw.tsv
├── sample_search_ask_breakpoint_seq.tsv
├── sample_search_ask_breakpoint.tsv
├── sample_search_ask_clip_count.bedgraph
├── sample_search_ask_cn_segmentation.tsv
├── sample_search_ask_jcs.tsv
├── sample_search_ask_sc_support_matrix.tsv
├── sample_search_ask_sc_normal_alignment_matrix.tsv
├── sample_search_ask_stats.tsv
├── sample_search_ask_step1.pdat
├── sample_search_ask_step2.pdat
├── sample_search_ask_step3.pdat
├── sample_search_ask_step4.pdat
├── sample_search_ask_junctionseq/
└── plot/
```

## Output files

| File or Directory                        | Generated by        | Description                                                     |
| ---------------------------------------- | ------------------- | --------------------------------------------------------------- |
| `*_ask_amplicon_circular.tsv`            | `ask`, `ask-search` | Candidate circular amplicon/ecDNA structures                    |
| `*_ask_amplicon_circular_stat.tsv`       | `ask`, `ask-search` | Summary statistics for circular amplicons                       |
| `*_ask_amplicon_linear.tsv`              | `ask`, `ask-search` | Candidate linear amplicon structures                            |
| `*_ask_amplified_segment.tsv`            | `ask`, `ask-search` | Amplified genomic segments inferred from copy number signal     |
| `*_ask_breakpoint.tsv`                   | `ask`, `ask-search` | Candidate breakpoint positions                                  |
| `*_ask_breakpoint_pair.tsv`              | `ask`, `ask-search` | Final breakpoint pairs used for amplicon reconstruction         |
| `*_ask_breakpoint_pair_raw.tsv`          | `ask`, `ask-search` | Raw breakpoint-pair candidates before final filtering           |
| `*_ask_breakpoint_seq.tsv`               | `ask`, `ask-search` | Breakpoint-associated sequence information                      |
| `*_ask_alignment_sequence.tsv`           | `ask`, `ask-search` | Read-level alignment sequence evidence for breakpoint junctions |
| `*_ask_junctionseq/`                     | `ask`, `ask-search` | Per-amplicon junction sequence files                            |
| `*_ask_bin_count.tsv`                    | `ask`, `ask-search` | Raw genomic bin counts                                          |
| `*_ask_bin_count_norm.tsv`               | `ask`, `ask-search` | Normalized bin counts for copy number estimation                |
| `*_ask_cn_segmentation.tsv`              | `ask`, `ask-search` | Copy number segmentation result                                 |
| `*_ask_clip_count.bedgraph`              | `ask`, `ask-search` | Soft-clipping evidence track                                    |
| `*_ask_sc_support_matrix.tsv`            | `ask`, `ask-search` | Single-cell junction-support matrix, generated only when barcodes are detected |
| `*_ask_sc_normal_alignment_matrix.tsv`   | `ask`, `ask-search` | Single-cell normal-alignment matrix, generated only when barcodes are detected |
| `*_ask_stats.tsv`                        | `ask`, `ask-search` | Run-level summary statistics                                    |
| `*_ask_step1.pdat` to `*_ask_step4.pdat` | `ask`, `ask-search` | Intermediate cache files                                        |
| `*_ask_jcs.tsv`                          | `ask-search`        | Junction Concordance Score summary                              |
| `known_ecDNA_segments.tsv`               | `ask-search`        | Parsed known ecDNA segments used as the search target           |
| `known_ecDNA_breakpoint_pairs.tsv`       | `ask-search`        | Reference breakpoint pairs derived from the known structure     |
| `known_breakpoint_seed.tsv`              | `ask-search`        | Breakpoint seed table used for targeted evidence collection     |
| `plot/`                                  | `ask`, `ask-search` | Amplicon visualization figures                                  |

## File formats

### Amplicon tables

`*_ask_amplicon_circular.tsv` and `*_ask_amplicon_linear.tsv` report reconstructed circular and linear amplicons. Each row is one segment assigned to an amplicon.

| Column | Description |
| ------ | ----------- |
| `Chrom` | Chromosome of the segment. |
| `Start`, `End` | Segment genomic coordinates. |
| `Strand` | Segment orientation in the reconstructed structure. |
| `SplitCount` | Number of split/junction-supporting reads associated with the segment. |
| `CN` | Segment copy number estimate. |
| `AmpliconID` | Reconstructed amplicon identifier, such as `circ_0` or `line_0`. |
| `Gene` | Genes overlapping the segment. |
| `CancerGene` | Cancer genes overlapping the segment. |
| `SE` | Super-enhancer annotations overlapping the segment. |

`*_ask_amplicon_circular_stat.tsv` summarizes each circular amplicon.

| Column | Description |
| ------ | ----------- |
| `AmpliconID` | Circular amplicon identifier. |
| `Chrom1`, `Start`, `Chrom2`, `End` | Outer genomic span used to summarize the amplicon. |
| `Seg_num` | Number of segments in the circle. |
| `Length` | Total segment length. |
| `SplitCount_sum`, `SplitCount_mean`, `SplitCount_std` | Junction-support read count summary across segments. |
| `CN_sum`, `CN_mean`, `CN_std` | Copy number summary across segments. |
| `FCleft_sum`, `FCright_sum` | Copy-number fold-change evidence at left and right boundaries. |
| `invCNCV_sum`, `invCNCV_mean` | Inverse copy-number coefficient-of-variation score; larger values indicate smoother CN. |
| `invSplitCV` | Inverse split-read coefficient-of-variation score. |
| `Gene_num` | Number of genes overlapping the amplicon. |
| `Cancergene_num` | Number of cancer genes overlapping the amplicon. |
| `SE_num` | Number of super-enhancer annotations overlapping the amplicon. |
| `FCleft_mean_1`, `FCright_mean_1` | Mean boundary fold-change values used in scoring. |
| `Score` | Final amplicon score. |

### Copy number tables

`*_ask_bin_count.tsv` contains raw bin-level read counts.

| Column | Description |
| ------ | ----------- |
| `Chrom` | Chromosome. |
| `Coord` | Bin coordinate. |
| `Count` | Read count in the bin. |
| `CN` | Copy number estimate for the bin. |

`*_ask_bin_count_norm.tsv` contains normalized bin counts.

| Column | Description |
| ------ | ----------- |
| `Chrom`, `Coord`, `Count`, `CN` | Same as `*_ask_bin_count.tsv`, after normalization. |
| `Log2Ratio` | Log2 copy-number ratio used for segmentation. |

`*_ask_cn_segmentation.tsv` reports copy-number segments.

| Column | Description |
| ------ | ----------- |
| `Chrom`, `Start`, `End` | Copy-number segment coordinates. |
| `Count` | Segment-level read count summary. |
| `CN` | Segment copy number estimate. |
| `Log2Ratio` | Segment log2 copy-number ratio. |

`*_ask_amplified_segment.tsv` reports amplified segments selected from copy-number and breakpoint evidence.

| Column | Description |
| ------ | ----------- |
| `Chrom`, `Start`, `End` | Amplified segment coordinates. |
| `Count` | Segment read count summary. |
| `CN` | Segment copy number estimate. |
| `ClipLeft`, `ClipRight` | Clipped-read support at left and right segment boundaries. |
| `Gene` | Genes overlapping the segment. |
| `CancerGene` | Cancer genes overlapping the segment. |

`*_ask_breakpoint_seg.tsv` stores breakpoint-derived segments used during graph construction.

| Column | Description |
| ------ | ----------- |
| `Chrom`, `Start`, `End` | Breakpoint-derived segment coordinates. |
| `CN` | Copy number assigned to the segment. |

### Breakpoint tables

`*_ask_breakpoint.tsv` reports candidate breakpoint positions.

| Column | Description |
| ------ | ----------- |
| `Chrom` | Chromosome of the breakpoint. |
| `Coord` | Breakpoint coordinate. |
| `Clip` | Breakpoint side: `L` for left-clipped boundary or `R` for right-clipped boundary. |
| `CleanBP` | Whether the breakpoint passes clean-breakpoint filtering. |
| `ClipDepth` | Number of clipped reads supporting the breakpoint. |
| `InDepth`, `OutDepth` | Local read-depth summaries inside and outside the breakpoint. |

`*_ask_breakpoint_pair_raw.tsv` and `*_ask_breakpoint_pair.tsv` report breakpoint-pair evidence. The raw file contains candidate pairs before final reconstruction filtering; the final file contains pairs used in amplicon reconstruction.

| Column | Description |
| ------ | ----------- |
| `Chrom1`, `Coord1`, `Clip1` | First breakpoint side. |
| `Chrom2`, `Coord2`, `Clip2` | Second breakpoint side. |
| `Count` | Supporting read count for the breakpoint pair. |
| `offset` | Junction offset. Negative values indicate overlap between the two breakpoint-side sequences; positive values indicate an insertion/gap. |
| `Seq` | Junction sequence or support type such as `PE_Support`. |
| `Readbarcode` | Cell barcodes supporting the breakpoint pair. Empty lists indicate no single-cell barcode was detected. |

### Single-cell matrix files

For single-cell assays, ASK automatically writes two barcode-level matrices when breakpoint-pair barcodes are detected. The two matrices always have the same rows and columns; missing values are filled with `0`.

| File | Description |
| ---- | ----------- |
| `*_ask_sc_support_matrix.tsv` | Junction-support matrix. Each row is a breakpoint pair and each barcode column stores the number of junction-supporting reads in that cell. |
| `*_ask_sc_normal_alignment_matrix.tsv` | Normal-alignment matrix. Each row is a breakpoint pair and each barcode column stores the number of normal reads spanning the breakpoint positions in that cell. |

| Column | Description |
| ------ | ----------- |
| `JunctionID` | Breakpoint-pair coordinate identifier, formatted as `Chrom1:Coord1:Clip1|Chrom2:Coord2:Clip2`. |
| `AmpliconID` | Amplicon containing the junction. |
| `<barcode>` | One column per cell barcode. Values are read counts. |

Normal-alignment reads must span the breakpoint coordinate and must not be supplementary, secondary, SA-tagged, or clipped near the breakpoint.

### Run summary

`*_ask_stats.tsv` is a plain-text run summary rather than a standard table. It records counts such as the number of amplified segments, breakpoint pairs, circular amplicons, and linear amplicons.

### JCS file

`*_ask_jcs.tsv` is generated by targeted search mode.

| Column                      | Description                                                          |
| --------------------------- | -------------------------------------------------------------------- |
| `CircleID`                  | Reference circle identifier                                          |
| `total_reference_junctions` | Number of reference junctions derived from the known ecDNA structure |
| `validated_junctions`       | Number of reference junctions supported in the query BAM             |
| `total_support_reads`       | Total supporting reads across validated junctions                    |
| `JCS`                       | Junction Concordance Score                                           |
| `Detected`                  | Whether the circle passes the JCS threshold                          |

JCS is computed as:

```text
JCS = validated reference junctions / total reference junctions
```

By default, a reference junction is considered validated when it has at least five supporting reads, and a circle is marked detected when `JCS > 0.5`.

## Algorithm overview

ASK reconstructs amplified structures from coverage and breakpoint evidence:

1. Alignment evidence extraction from an indexed BAM.
2. Read counting in genomic bins.
3. Copy number normalization and segmentation.
4. Amplified segment detection.
5. Breakpoint detection from clipping and supplementary-alignment evidence.
6. Breakpoint-pair construction.
7. Graph-based circular and linear amplicon reconstruction.
8. Gene, cancer gene, and super-enhancer annotation.
9. ASK-style visualization.

The targeted `ask-search` workflow follows the same evidence model but constrains the initial evidence collection using a known ecDNA structure.

## Checkpointing and modular usage

ASK writes intermediate `.pdat` files:

| File               | Stage                                       |
| ------------------ | ------------------------------------------- |
| `*_ask_step1.pdat` | Alignment evidence and bin counts           |
| `*_ask_step2.pdat` | Copy number and amplified segment detection |
| `*_ask_step3.pdat` | Breakpoint-pair detection                   |
| `*_ask_step4.pdat` | Amplicon reconstruction                     |

These files are useful for debugging, rerunning downstream steps, and comparing parameter choices. When rerunning from scratch, use a fresh output prefix or remove incompatible intermediate files.

ASK can also be used modularly:

| Use Case                                   | Suggested Entry Point                                                 |
| ------------------------------------------ | --------------------------------------------------------------------- |
| Start from BAM                             | `ask`                                                                 |
| Start from known ecDNA structure           | `ask-search`                                                          |
| Compare one reference ecDNA across samples | Run `ask-search` once per query BAM                                   |
| Replot existing ASK outputs                | Use the generated circular, linear, copy number, and bin-count tables |

## License

ASK is released under the MIT License. See [LICENSE](https://github.com/nanawei11/AmpliconSeeK/blob/main/LICENSE) for details.

## Contact

For questions and feedback, please open an issue on [GitHub](https://github.com/nanawei11/AmpliconSeeK/issues) or contact Nana Wei (nnwei@shsmu.edu.cn).
