Metadata-Version: 2.4
Name: synomicsbench
Version: 1.0.3
License-File: LICENSE
Requires-Python: >=3.12
Requires-Dist: anonymeter>=1.0.0
Requires-Dist: baycomp>=1.0.3
Requires-Dist: codecarbon>=3.2.5
Requires-Dist: copulas>=0.12.0
Requires-Dist: ctgan>=0.10.2
Requires-Dist: deptry>=0.25.1
Requires-Dist: gseapy>=1.1.12
Requires-Dist: ipykernel>=7.2.0
Requires-Dist: joblib>=1.5.3
Requires-Dist: jupyter>=1.1.1
Requires-Dist: lab>=8.9
Requires-Dist: lifelines>=0.30.3
Requires-Dist: lightgbm>=4.6.0
Requires-Dist: matplotlib>=3.10.8
Requires-Dist: memory-profiler>=0.61.0
Requires-Dist: miceforest>=6.0.5
Requires-Dist: missingno>=0.5.2
Requires-Dist: mpl-tools>=0.4.1
Requires-Dist: mygene>=3.2.2
Requires-Dist: numba>=0.64.0
Requires-Dist: numpy>=1.26.4
Requires-Dist: pandas>=2.3.3
Requires-Dist: polar>=0.0.127
Requires-Dist: polars>=1.39.2
Requires-Dist: pytest>=9.0.3
Requires-Dist: rpy2==3.5.16
Requires-Dist: scienceplots>=2.2.1
Requires-Dist: scikit-bio>=0.7.0
Requires-Dist: scikit-learn>=1.8.0
Requires-Dist: scipy>=1.17.1
Requires-Dist: sdmetrics>=0.18.0
Requires-Dist: sdv>=1.18.0
Requires-Dist: seaborn>=0.13.2
Requires-Dist: statsmodels>=0.14.6
Requires-Dist: torch>=2.10.0
Requires-Dist: tqdm>=4.67.3
Description-Content-Type: text/markdown

# SynOmicsBench

**SynOmicsBench** is a unified benchmarking framework for synthetic data generation (SDG) for clinical transcriptomic cancer cohorts.

Achieving a trade-off between **biological utility** and **patient privacy** is critical for secure data sharing when applying transcriptomic clinical datasets to artificial intelligence in precision oncology. Here, we present the **SynOmicsBench** framework. SynOmicsBench combines standardized preprocessing with multidimensional evaluation, prioritizing downstream biological validation alongside statistical fidelity and attack-based privacy assessment. This work provides a reproducible decision-support tool for method selection and promotes biologically informed, privacy-aware adoption of synthetic data in precision oncology.

---

## Installation

```bash
pip install synomicsbench
```

Python 3.12+ is required.

---

## Quick Start

```python
import pandas as pd
from synomicsbench.processing.preprocessing import DataProcessor
from synomicsbench.processing.metadata import MetaData
from synomicsbench.synthesizer.GaussianCopulasynthesizer import GaussianCopulasynthesizer
from synomicsbench.metrics.fidelity.UnivariateSimilarity import UnivariateSimilarity

# 1. Preprocess
data = pd.read_csv("clinical_transcriptomic_data.csv")
data = DataProcessor.remove_unknown_entities(data, id_column="Patient_ID")
data = DataProcessor.remove_duplications(data, axis=0).reset_index(drop=True)
data = DataProcessor.mice_imputation(data, iterations=10, n_estimators=100)

# 2. Metadata
metadata = MetaData.get_metadata(
    data=data,
    ordinal_features=["Mstage", "Tx_Start_ECOG", "numPriorTherapies"],
    threshold_unique_values=10,
)

# 3. Generate synthetic data
synth = GaussianCopulasynthesizer(output_path="./results", metadata=metadata)
synthetic_data = synth.generate(
    data=data,
    seed=42,
    n_samples=data.shape[0],
    output_filename="synthetic_data.csv",
)

# 4. Evaluate
evaluator = UnivariateSimilarity(output_dir="./results/evaluation")
score = evaluator.get_univariate_score(
    original_data=data,
    synthetic_data=synthetic_data,
    metadata=metadata,
    save=True,
)
print(f"Univariate Fidelity Score: {score:.4f}")
```

---

## Documentation

Full documentation, API reference, and benchmarking results:
**[https://trinhthechuong.github.io/SynOmicsBench/](https://trinhthechuong.github.io/SynOmicsBench/)**

---

## Citation

If you use SynOmicsBench in your research, please cite:

> Trinh, T. C., Woillard, J. B., Uguzzoni, G., & Battail, C. (2024). **A unified benchmark of synthetic data generation for clinical and transcriptomic cancer data.** *(Manuscript in preparation)*

---

## License

MIT License
