Metadata-Version: 2.4
Name: sjcab_peak2anno_db
Version: 0.1.1
Summary: Gene-backed peak-to-annotation BED resources for SJ/CAB workflows.
Author: SJ/CAB
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.7
Description-Content-Type: text/markdown

# sjcab_peak2anno_db

Gene-backed peak-to-annotation BED resources for SJ/CAB workflows.

The package bundles `gene` BED files to keep the distribution small. It can
generate these derived annotations from each bundled gene version:

- `tss`: 1 bp TSS intervals, strand-aware
- `tes`: 1 bp TES intervals, strand-aware
- `deduplong`: longest isoform per gene name, using column 4 as gene name and
  column 5 as isoform length

The package also bundles blacklist BED files. During install, each blacklist is
written under `~/.sjcab_peak2anno_db/blacklists` as a dated
`*.bed.20230411` file, with the current `*.bed` name refreshed as a symlink.
CpG island (CGI) BED files for `hg38`, `hg19`, `mm10`, `mm9`, and `mm39` are
also bundled and installed into `~/.sjcab_peak2anno_db/cgi`.

Supported species:

- `hg19`
- `hg38`
- `mm10`
- `mm9`
- `sacCer3`

## Install

From TestPyPI:

```bash
python -m pip install -i https://test.pypi.org/simple/ sjcab_peak2anno_db
```

From a local checkout:

```bash
python -m pip install .
```

## Generate User Data

After package installation, generate all available versions under
`~/.sjcab_peak2anno_db`:

```bash
sjcab-peak2anno-db install
```

That command installs the packaged gene BEDs, blacklists, and CGI BEDs into the
cache directory.

To use a different data directory, set `SJCAB_PEAK2ANNO_DB_PATH`:

```bash
export SJCAB_PEAK2ANNO_DB_PATH=/path/to/sjcab_peak2anno_db
sjcab-peak2anno-db install
```

You can also pass `--data-dir` for one command:

```bash
sjcab-peak2anno-db install --data-dir /path/to/sjcab_peak2anno_db
```

Generated files are written as:

```text
{data_dir}/{species}/{annotation}/{version}.bed
{data_dir}/{species}/{annotation}/default.bed
{data_dir}/blacklists/{name}.bed.20230411
{data_dir}/blacklists/{name}.bed
```

`default.bed` points to the latest parsed version for that species and
annotation.

Download UCSC CpG island BED files for `hg38`, `hg19`, `mm10`, `mm9`, and
`mm39`:

```bash
sjcab-peak2anno-db download-cgi
```

CGI files are written as:

```text
{data_dir}/cgi/{species}_cgi.bed
```

## Python Usage

```python
import sjcab_peak2anno_db as db

print(db.supported_species())
print(db.versions("hg38", "gene"))
print(db.default_version("hg19", "tss"))

db.install_data()
print(db.path("hg38", "tss"))
```

Generate one derived file manually:

```python
import sjcab_peak2anno_db as db

db.write_tss("genes.bed", "genes.tss.bed")
db.write_tes("genes.bed", "genes.tes.bed")
db.write_deduplong("genes.bed", "genes.deduplong.bed")
```

## Command Line

```bash
sjcab-peak2anno-db list
sjcab-peak2anno-db install
sjcab-peak2anno-db install-blacklists
sjcab-peak2anno-db download-cgi
sjcab-peak2anno-db update
sjcab-peak2anno-db path hg38 gene
sjcab-peak2anno-db path hg38 tss --install
```
