Metadata-Version: 2.4
Name: cellitac
Version: 1.0.1
Summary: Cell type identification using Transcription factor Analysis and Chromatin accessibility
Author-email: "Rana H. Abu-Zeid" <ranahamed2111@gmail.com>, Olaitan Awe <laitanawe@gmail.com>, Syrus Semawule <semawulesyrus@gmail.com>, Emmanuel Aroma <emmatitusaroma@gmail.com>, Toheeb Jumah <jumahtoheeb@gmail.com>, Derek Reiman <dreiman@ttic.edu>
License: MIT
Project-URL: Homepage, https://github.com/omicscodeathon/cellitac/
Keywords: single-cell,scATAC-seq,scRNA-seq,multiome,cell-type-identification,transcription-factor,chromatin-accessibility,machine-learning
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: openpyxl>=3.1
Requires-Dist: rpy2>=3.5
Requires-Dist: scikit-learn>=1.3
Requires-Dist: xgboost>=2.0
Requires-Dist: imbalanced-learn>=0.11
Requires-Dist: sklearn-compat>=0.1.5
Requires-Dist: matplotlib>=3.7
Requires-Dist: seaborn>=0.12
Requires-Dist: plotly>=5.18
Requires-Dist: networkx>=3.1
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# cellitac

Single-Cell ATAC + RNA Multiome Processing & ML Classification Pipeline

---

## What It Does

| Stage | Steps | Tools |
|-------|-------|-------|
| **Preprocessing** | RNA QC → normalization → cell-type annotation | Seurat + SingleR (R via rpy2) |
| **Preprocessing** | ATAC QC → TF-IDF → LSI | Signac (R via rpy2) |
| **Preprocessing** | RNA + ATAC integration → ML-ready CSVs | Pure Python |
| **ML** | Imbalance analysis → SMOTE → feature selection | scikit-learn, imbalanced-learn |
| **ML** | RF + XGBoost + SVM training & evaluation | scikit-learn, xgboost |
| **ML** | 19 plots + JSON report + XLSX | matplotlib, seaborn, networkx |

---



# Install R packages (run once inside R)
Rscript -e "
  install.packages('BiocManager')
  BiocManager::install(c(
    'Seurat', 'Signac', 'SingleR', 'celldex',
    'SingleCellExperiment', 'GenomicRanges',
    'EnsDb.Hsapiens.v75', 'biovizBase', 'hdf5r'
  ))
"

# Install Python package
pip install -e ".[dev]"
```

### Option B – PyPI

```bash
pip install cellitac
# R must be installed separately
```

### Option C – Docker (recommended for full reproducibility)

```bash
docker build -t cellitac:1.0.0 -f docker/Dockerfile .

docker run --rm \
  -v /your/data:/data \
  -v $(pwd)/results:/results \
  cellitac:1.0.0 \
  --input /data --output /results
```

---

## Data Download

https://www.10xgenomics.com/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-10-k-1-standard-1-0-0

Required files (place in your `--input` directory):
```
pbmc_unsorted_10k_filtered_feature_bc_matrix.h5
pbmc_unsorted_10k_per_barcode_metrics.csv
pbmc_unsorted_10k_atac_fragments.tsv.gz
pbmc_unsorted_10k_atac_fragments.tsv.gz.tbi
pbmc_unsorted_10k_atac_peaks.bed
```

---

## Usage

### Command Line

```bash
# Full pipeline (preprocessing + ML)
cellitac --input ~/singlecell/ATAC --output my_results

# Preprocessing only (generates python_ready_data/)
cellitac-preprocess --input ~/singlecell/ATAC --output my_results

# ML only (if you already have python_ready_data/)
cellitac-model --data my_results/python_ready_data --output my_results/ml
```

### Python API

```python
from cellitac import run_full_pipeline, run_preprocessing, run_model

# Full pipeline
run_full_pipeline(input_dir="~/singlecell/ATAC", output_dir="my_results")

# Preprocessing only
run_preprocessing(input_dir="~/singlecell/ATAC", output_dir_python="python_ready_data")

# ML only
run_model(data_dir="python_ready_data", output_dir="ml_results")

# Use the ML class directly for more control
from cellitac.mainModel import scATACMLPipeline
pipeline = scATACMLPipeline(data_dir="python_ready_data", output_dir="ml_results")
pipeline.run_complete_pipeline()
```

### Environment Variables

```bash
export SCATAC_INPUT_DIR=~/singlecell/ATAC
export SCATAC_OUT_ML=ml_results
cellitac
```

---

## Output Files

### ml_results/
| File | Description |
|------|-------------|
| `ml_pipeline_report.json` | Full JSON report |
| `model_performance_summary.csv` | Accuracy/F1/AUC per model |
| `detailed_model_results.xlsx` | Per-class metrics, CV results |
| `model_performance_comparison.png` | Bar chart comparison |
| `confusion_matrices.png` | Confusion matrices |
| `class_distribution_analysis.png` | Cell type distribution |
| `class_balancing_comparison.png` | Before/after SMOTE |
| `feature_importance.png` | RF + XGBoost top 20 features |
| `simple_feature_heatmap.png` | Feature importance heatmap |
| `overfitting_analysis.png` | CV train vs validation |
| `learning_curves.png` | Learning curves per model |
| `performance_radar.png` | Radar chart |
| `feature_distributions.png` | Violin plots |
| `class_separation_pca.png` | PCA scatter |
| `basic_tf_network.png` | Feature–cell-type network |

---

## Package Structure

```
cellitac/
├── src/cellitac/
│   ├── __init__.py          # Public API
│   ├── _version.py
│   ├── config.py            # All parameters (paths, QC thresholds, ML hyperparams)
│   ├── pipeline.py          # run_preprocessing, run_model, run_full_pipeline
│   ├── preprocessing.py     # R preprocessing via rpy2
│   ├── mainModel.py         # scATACMLPipeline class (19-step ML pipeline)
│   ├── cli.py               # cellitac / cellitac-preprocess / cellitac-model
│   └── rscripts/
│       ├── team1_rna.R      # Exact Seurat + SingleR code
│       └── team2_atac.R     # Exact Signac code
├── tests/
│   └── test_model.py
├── pyproject.toml
└── README.md
```

---

## Tests

```bash
pip install -e ".[dev]"
pytest tests/ -v
```

---

## License

MIT
