Metadata-Version: 2.4
Name: omicsync
Version: 0.1.0
Summary: Multi-omics data harmonisation for Python
Author-email: "Paterson V." <citrus.bird72@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Paterson V.
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/vi-c-ky/omicsync
Project-URL: Documentation, https://github.com/vi-c-ky/omicsync/blob/main/docs/index.md
Project-URL: Repository, https://github.com/vi-c-ky/omicsync
Project-URL: Bug Tracker, https://github.com/vi-c-ky/omicsync/issues
Keywords: bioinformatics,multi-omics,TCGA,genomics,data harmonisation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: numpy>=1.23.0
Requires-Dist: scipy>=1.9.0
Requires-Dist: scikit-learn>=1.1.0
Requires-Dist: requests>=2.28.0
Provides-Extra: mofa
Requires-Dist: mofapy2>=0.7.0; extra == "mofa"
Provides-Extra: geo
Requires-Dist: GEOparse>=2.0.0; extra == "geo"
Provides-Extra: torch
Requires-Dist: torch>=1.12.0; extra == "torch"
Provides-Extra: anndata
Requires-Dist: anndata>=0.8.0; extra == "anndata"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Provides-Extra: all
Requires-Dist: omicsync[anndata,geo,mofa,torch]; extra == "all"
Dynamic: license-file

# omicsync

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![PyPI version](https://img.shields.io/pypi/v/omicsync.svg)](https://pypi.org/project/omicsync/)

**A Python library for multi-omics data harmonisation.**

omicsync handles the tedious work of aligning sample IDs, normalising each modality consistently, and exporting to downstream tools so you can focus on biology, not data wrangling.

---

## Installation

```bash
pip install omicsync
```

With optional extras:

```bash
pip install "omicsync[mofa]"       # MOFA2 factor analysis
pip install "omicsync[geo]"        # GEO data loading
pip install "omicsync[anndata]"    # AnnData export
pip install "omicsync[torch]"      # PyTorch tensor export
pip install "omicsync[all]"        # Everything
```

---

## Quick Start

```python
import omicsync as oms
from omicsync.loaders.csv import load_multimodal_csv

# Load multiple modalities from CSV files
dataset = load_multimodal_csv({
    "rna":     "brca_rna.tsv",
    "protein": "brca_rppa.tsv",
    "cnv":     "brca_cnv.tsv",
}, study_id="TCGA-BRCA")

# Align, normalise, filter — all chainable
dataset.align_samples().normalize().filter_features(min_variance=0.01)

# Export to DataFrame or MOFA2
df = dataset.to_dataframe()          # samples × features, prefixed columns
mofa_input = dataset.to_mofa2()      # dict ready for mofapy2 entry_point
```

---

## Features

- **Sample harmonisation** — TCGA barcode parsing, fuzzy ID matching, coverage reporting
- **Per-modality normalisation** — auto-detection of count/TPM/M-value formats
- **Chainable API** — `dataset.align().normalize().filter_features()`
- **sklearn compatibility** — use `OmicsSyncTransformer` in a `Pipeline`
- **Multiple export formats** — DataFrame, dict, MOFA2, PyTorch tensor, AnnData
- **Open Targets integration** — query target-disease associations via GraphQL
- **Type hints throughout** — fully typed public API

---

## Supported Data Sources

| Source | Loader | Notes |
|--------|--------|-------|
| TCGA | `load_tcga_files()` | Local files; barcode auto-harmonisation |
| GEO | `load_geo()` | Via GEOparse; requires `omicsync[geo]` |
| CSV/TSV | `load_csv()` | Any tabular file |
| Open Targets | `load_open_targets_targets()` | GraphQL API v4 |

---

## Supported Modalities

| Modality | Class | Default Normalisation |
|----------|-------|-----------------------|
| RNA expression | `RNAModality` | `detect_and_normalise()` (log1p) |
| DNA methylation | `MethylationModality` | M→beta conversion + clip |
| Copy number | `CNVModality` | log2 ratio, clipped [-2, 2] |
| Somatic mutations | `MutationModality` | Binarise at threshold |
| Protein abundance | `ProteinModality` | Z-score per protein |

---

## Documentation

- [Quickstart guide](docs/quickstart.md)
- [API reference](docs/api_reference.md)
- [Tutorial: TCGA BRCA](docs/tutorials/tcga_brca.md)
- [Tutorial: Custom CSV data](docs/tutorials/custom_data.md)

---

## Citation

If you use omicsync in your research, please cite:

> Paterson V. (2026). *omicsync: A Python library for multi-omics data harmonisation*. GitHub: github.com/vi-c-ky/omicsync

---

## Contributing

Contributions are welcome. Please open an issue or pull request on GitHub.

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/my-feature`)
3. Write tests for new functionality
4. Run the test suite (`pytest tests/`)
5. Open a pull request

---

## License

MIT — see [LICENSE](LICENSE) for details.
