Metadata-Version: 2.4
Name: vdjmatch
Version: 0.0.1
Summary: Fast, control-calibrated annotation of T-cell receptor antigen specificity
Project-URL: Homepage, https://github.com/antigenomics/vdjmatch
Project-URL: Repository, https://github.com/antigenomics/vdjmatch
Project-URL: Documentation, https://antigenomics.github.io/vdjmatch/
Author-email: ISALGO laboratory <mikhail.shugay@gmail.com>
License: GPL-3.0-or-later
License-File: LICENSE
Keywords: AIRR,E-value,TCR,VDJdb,antigen-specificity,immunology,repertoire
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Requires-Dist: polars>=1.0
Requires-Dist: seqtree>=0.0.3b1
Provides-Extra: bench
Requires-Dist: psutil; extra == 'bench'
Provides-Extra: control
Requires-Dist: seqtree[control]; extra == 'control'
Provides-Extra: ods
Requires-Dist: ezodf; extra == 'ods'
Provides-Extra: test
Requires-Dist: hypothesis; extra == 'test'
Requires-Dist: pytest>=7.4; extra == 'test'
Description-Content-Type: text/markdown

# vdjmatch

Fast, control-calibrated annotation of **T-cell receptor antigen specificity**.

`vdjmatch` annotates clonotypes in large AIRR repertoires against [VDJdb](https://github.com/antigenomics/vdjdb-db)
by fuzzy CDR3 search, reporting a **control-calibrated E-value** (BLAST-style significance against a
background repertoire) and enriched antigen-specificity labels. It is a Python rewrite of the legacy
Java/Groovy vdjmatch, built on the [`seqtree`](https://github.com/antigenomics/seqtree) search core.

> **Status:** 2.0 alpha, under active development on `dev`. The legacy Java tool is preserved on the
> `legacy-java` branch (tags `1.1.4`–`1.3.1`).

## Features (target)

- Fetch the latest VDJdb release and annotate AIRR Rearrangement / Cell (paired α/β) samples.
- Extremely fast, multithreaded search of million-scale repertoires (via `seqtree`).
- Control-calibrated **E-values** (single-chain now; paired α/β and single-chain-paired estimates).
- Custom substitution matrices, including segment-specific (V / NDN / J) scoring; the TCR-specific
  **VDJAM** matrix is bundled.
- Rich per-hit output: ranked hits, CIGAR + alignment match/gap, alignment scores, E-values.
- Epitope-level enrichment summaries; pairwise sample overlap.
- `polars` throughout for I/O.

## Install (development)

```fish
python -m venv .venv
source .venv/bin/activate.fish
pip install -e .[test,bench]
```

`seqtree` (the search engine) is installed from PyPI as a dependency.

## License

GPL-3.0-or-later (it builds on `seqtree`, which is GPL-3.0-or-later).
