Metadata-Version: 2.4
Name: scblueprint
Version: 0.1.0
Summary: Rule-based cell-type annotation for single-cell RNA-seq with automatic resolution optimization
Author: dam2452
License-Expression: MIT
Project-URL: Homepage, https://github.com/dam2452/scblueprint
Project-URL: Issues, https://github.com/dam2452/scblueprint/issues
Keywords: single-cell,rna-seq,cell-type-annotation,clustering,bioinformatics,scanpy,leiden
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scanpy>=1.10
Requires-Dist: anndata>=0.10
Requires-Dist: pyyaml>=6.0
Requires-Dist: pydantic>=2.0
Requires-Dist: matplotlib>=3.8
Requires-Dist: numpy>=1.26
Requires-Dist: igraph>=0.11
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pylint>=3.0; extra == "dev"
Requires-Dist: mypy>=1.8; extra == "dev"
Dynamic: license-file

<h1 align="center">scblueprint</h1>

<p align="center">
  <strong>Rule-based cell-type annotation for single-cell RNA-seq with automatic resolution optimization</strong>
</p>

<p align="center">
  <a href="https://pypi.org/project/scblueprint/"><img src="https://img.shields.io/pypi/v/scblueprint.svg" alt="PyPI"/></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License"/></a>
  <img src="https://img.shields.io/badge/python-3.11%2B-blue.svg" alt="Python 3.11+"/>
</p>

---

**scblueprint** is a Python library for reproducible, YAML-driven cell-type annotation of single-cell RNA-seq data. It scores gene signatures, optimizes Leiden clustering resolution, applies correction rules, and provides an explain mode that shows exactly why each cluster received its label.

## Features

- **YAML blueprints** - define cell types with positive/negative markers, colors, and references in a single file
- **Automatic resolution** - coarse-to-fine Leiden sweep that maximizes biological label diversity
- **Correction rules** - 4 built-in types (expression threshold, ontogeny override, coexpression required, mutually exclusive) plus custom Python rules
- **Explain mode** - every cluster label comes with score breakdowns and rule override chains
- **Labeling strategies** - majority vote or DE-gene overlap for cluster-level assignment
- **UMAP sweep** - multiprocessing parameter sweep with grid output
- **Presets** - `mouse_cardiac` with 30 literature-sourced cardiac cell types
- **Built on scanpy** - integrates with any scanpy/AnnData workflow

## Installation

```bash
pip install scblueprint
```

## Quick Start

```python
import scanpy as sc
import scblueprint as scb

adata = sc.read_h5ad("my_data.h5ad")

bp = scb.Blueprint.from_preset("mouse_cardiac")

opt = scb.LeidenOptimizer()
result = opt.find_optimal(adata, bp.signatures, "leiden",
                          negative_markers=bp.negative_markers)
print(f"Best resolution: {result.resolution}")

ann = scb.Annotator(bp)
res = ann.apply(adata, "leiden", "cell_type", de_key="global_de")
print(res.summary())

ev = res.explain("3")
print(f"{ev.final_label}: {ev.score_breakdown}")
```

For YAML schema, correction rules, strategies and UMAP sweep see **[docs/usage.md](docs/usage.md)**.

## API

| Class | Description |
|---|---|
| `Blueprint` | Load YAML, access signatures / negative_markers / colors / rules |
| `LeidenOptimizer` | Scan resolutions, pick the one maximizing biological label diversity |
| `Annotator` | Score -> label -> correct -> explain |
| `UmapSweeper` | Multiprocessing UMAP parameter sweep with grid output |
| `LabelCorrectionRule` | ABC for custom correction rules |

## Examples

6 runnable scripts covering basic annotation, auto-resolution, explain mode, custom rules, UMAP sweep and subpopulation deep dive - see **[docs/examples.md](docs/examples.md)** for the full list.

```bash
cd examples && python generate_all.py
```

## Citation

If you use **scblueprint** in a publication, please cite it:

**APA:**

> dam2452. (2026). scblueprint: Rule-based cell-type annotation for single-cell RNA-seq (Version 0.1.0). https://github.com/dam2452/scblueprint

**BibTeX:**

```bibtex
@software{scblueprint2026,
  title   = {scblueprint: Rule-based cell-type annotation for single-cell RNA-seq},
  author  = {dam2452},
  year    = {2026},
  version = {0.1.0},
  url     = {https://github.com/dam2452/scblueprint}
}
```

## Contributing

Contributions are welcome! Here's how you can help:

1. **Bug reports** - Open an issue with a minimal reproducible example
2. **Feature requests** - Open an issue describing the use case
3. **Code contributions** - Fork, create a feature branch, and open a pull request
4. **New presets** - Add a YAML file under `scblueprint/presets/` with markers and a test

### Development setup

```bash
git clone https://github.com/dam2452/scblueprint.git
cd scblueprint
pip install -e ".[dev]"
pytest tests/ -v
```

## License

This project is licensed under **MIT** - see [LICENSE](LICENSE) for details.
