Metadata-Version: 2.1
Name: darkprofiler
Version: 0.1.0
Summary: DarkProfiler: Alignment and Classification of Peptides from Reference-Independent De Novo Peptide Sequencing Experiments.
Author-email: Hanjun Lee <hanjun@alum.mit.edu>
License: MIT
Keywords: proteomics,immunopeptidomics,neoantigen,bioinformatics
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: biopython>=1.78
Requires-Dist: matplotlib>=3.3

# DarkProfiler

**DarkProfiler: Alignment and Classification of Peptides from Reference-Independent De Novo Peptide Sequencing Experiments**

DarkProfiler takes peptide sequences (e.g. from de novo sequencing) and classifies them into:

- **Canonical proteome**
- **Alternative splicing**
- **Neoantigens (SNV-derived mutanome)**
- **Alternative reading frame peptides**
- **Amino acid misincorporations**
- **Unknown / unaligned**

It supports human and mouse references: `hg19`, `hg38`, `mm10`, `mm39`.

---

## Installation

### Install with pip (PyPI)

```bash
pip install darkprofiler
```

### Install with conda (bioconda)

```bash
conda install bioconda::darkprofiler
```

---

## Reference genome

DarkProfiler supports human and mouse reference genomes.

Supported genome assemblies are:

```
hg19 (GENCODE release 19)
hg38 (GENCODE release 37)
mm10 (GENCODE release M19)
mm39 (GENCODE release M37)
```

---

## Command-line usage

### Download reference data

```bash
darkprofiler download hg38
```

### Run classification

```bash
darkprofiler run hg38 peptides.fa output_dir
```

Optional flags:

```
--vcf-path FILE
--database-path DIR
--num-threads N
```

---

## Python API

```python
from darkprofiler.run import classify_peptides

classify_peptides(
    reference="hg38",
    peptide_fasta="peptides.fa",
    output_dir="output",
    vcf_path=None,
    database_path=None,
    num_threads=4
)
```

---

## Outputs

- canonicalProteome.fa  
- alternativeSplicing.fa  
- neoantigen.fa  
- alternativeReadingFrame.fa  
- aminoAcidMisincorporation.fa  
- unknown.fa  
- pieChart.tsv  
- pieChart.pdf  

---

## License

MIT License  
Copyright (c) 2025  
