Metadata-Version: 2.4
Name: variant-mapper
Version: 0.1.1b0
Summary: Map genetic variants to rsIDs
Author-email: Chris Finan <c.finan@ucl.ac.uk>
License-Expression: GPL-3.0-or-later
Project-URL: Homepage, https://cfinan.gitlab.io/variant-mapper/index.html
Project-URL: Repository, https://gitlab.com/cfinan/variant-mapper
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: <3.14,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: biopython
Requires-Dist: cfin-merge-sort
Requires-Dist: cfin-pyaddons
Requires-Dist: ensembl-rest-client
Requires-Dist: genomic-config
Requires-Dist: multi-join
Requires-Dist: python3-wget
Requires-Dist: rapidgzip
Requires-Dist: stdopen
Requires-Dist: portalocker
Requires-Dist: pysam
Requires-Dist: tqdm
Requires-Dist: zstandard
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-dependency>=0.5; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: bump2version; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: build; extra == "dev"
Dynamic: license-file

# variant-mapper

__version__: `0.1.1b0`

The variant-mapper is a package to map genetic variants map genetic variants to the genome, in order to validate them and assign an rs ID.

This will:

1. Localise the genetic variant based on chromosome position, using either a file join approach or tabix.
2. Determine if the ref/alt alleles match to a known variant site, it assumes that ref/alt can be flipped.
3. If a site can be identified then it will annotate the variant with function information.
4. If no site can be identified
5. If an INDEL, normalise the alleles and attempt mapping again.
6. Finally, is still can't be mapped validate one of the alleles against the reference genome assembly
7. This can also handle cases where only a single allele is known, assuming the site is bi-alleilic and the ref allele can be localised.

The mapper works by having a common mapper file and a full mapper file. The common mapper file contains common variants usually used in GWAS studies and the full mapper file has all known variant from dbSNP and from other projects as well.

You can either map by localising the genetic variants using tabix or by a table scan (file join) approach. The file join is most efficient if you have millions of variants, or rather if your input fie is ~10-20M variants. In this case the common file is used for the join and where something can't be mapped then a tabix query is tried again the full file. In many cases the common file is good enough but it might miss some variants. In any case, please contact me for a download link. There is nothing super secret about the mapping file, UCL does not offer any file distribution and I have no other official way of distributing it, so it is on my personal pCloud at the moment. 
    
## Installation
This can be installed using pypi or conda

To install using pypi:
```
pip install variant-mapper
```

To install using conda:
```
conda install -c cfin -c conda-forge variant-mapper
```

## Documentation
There is [online](https://cfinan.gitlab.io/variant-mapper/index.html) documentation for variant mapper.
