Metadata-Version: 2.4
Name: sonata-learn
Version: 0.1.0
Summary: A Python toolkit for fitting and analyzing mutational signatures
License-Expression: MIT
License-File: LICENSE
Keywords: AnnData,bioinformatics,cancer genomics,mutational signatures,NMF
Author: Benedikt Geiger
Author-email: benedikt_geiger@g.harvard.edu
Maintainer: Benedikt Geiger
Maintainer-email: benedikt_geiger@g.harvard.edu
Requires-Python: >=3.10,<3.14
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Dist: adjustText (>=1.3,<2)
Requires-Dist: anndata (>=0.11,<0.13)
Requires-Dist: fastcluster (>=1.2.6,<2)
Requires-Dist: matplotlib (>=3.8,<4)
Requires-Dist: numba (>=0.61,<0.63)
Requires-Dist: numpy (>=1.26,<3)
Requires-Dist: pandas (>=2.2,<3)
Requires-Dist: scikit-learn (>=1.6,<2)
Requires-Dist: scipy (>=1.13,<2)
Requires-Dist: seaborn (>=0.13,<0.14)
Requires-Dist: umap-learn (>=0.5.7,<0.6)
Project-URL: Documentation, https://github.com/parklab/Sonata/blob/main/docs/tutorial.md
Project-URL: Homepage, https://github.com/parklab/Sonata
Project-URL: Repository, https://github.com/parklab/Sonata
Description-Content-Type: text/markdown

# Sonata

[![Python versions supported][python-image]][python-url]
[![License][license-image]][license-url]
[![Code style][style-image]][style-url]

[python-image]: https://img.shields.io/badge/python-3.10%20|%203.11%20|%203.12%20|%203.13-blue.svg
[python-url]: https://github.com/parklab/Sonata
[license-image]: https://img.shields.io/badge/License-MIT-yellow.svg
[license-url]: https://github.com/parklab/Sonata/blob/main/LICENSE
[style-image]: https://img.shields.io/badge/code%20style-black-000000.svg
[style-url]: https://github.com/psf/black

Sonata is a Python toolkit for fitting and analyzing mutational signatures. It
fits signatures and exposures in
[AnnData](https://anndata.readthedocs.io/en/latest/) objects and provides
analysis and plotting APIs for signature workflows.

## Installation

```bash
pip install sonata-learn
```

The package is installed as `sonata-learn` and imported as `sonata`.

## Quickstart

```python
import sonata as so

model = so.models.NMF(n_signatures=6)
model.fit(adata)

so.pl.barplot(model.asignatures)
so.pl.stacked_barplot(model.exposures)

so.tl.reduce_dimension(
    model.adata,
    basis="exposures",
    method="umap",
)
so.pl.embedding(model.adata, basis="umap")
```

## Data Format

Sonata expects mutation counts in an `AnnData` object:

- `adata.X`: count matrix with shape `n_samples x n_mutation_types`.
- `adata.obs`: optional sample annotations.
- `adata.var`: optional mutation-type annotations.

After fitting, the model stores learned signatures in `model.asignatures` and
sample exposures in `model.adata.obsm["exposures"]`.

## Documentation

For a complete workflow covering data preparation, NMF, visualization,
fixed signatures, Cornet, and simple model selection, see the
[Markdown tutorial][tutorial-md]. A runnable notebook with the same analysis and
figure-generation code is available at [docs/tutorial.ipynb][tutorial-ipynb].

## Models

Sonata currently exposes three algorithms:

- `so.models.NMF`: NMF with the generalized Kullback-Leibler divergence.
- `so.models.MvNMF`: minimum-volume NMF.
- `so.models.Cornet`: correlated NMF with joint sample and signature embeddings.

## License

MIT

[tutorial-md]: https://github.com/parklab/Sonata/blob/main/docs/tutorial.md
[tutorial-ipynb]: https://github.com/parklab/Sonata/blob/main/docs/tutorial.ipynb

