Metadata-Version: 2.4
Name: gge-eval
Version: 0.1.1
Summary: Generated Genetic Expression Evaluator (GGE): Comprehensive evaluation of generated gene expression data. Computes metrics between real and generated datasets with support for condition matching, train/test splits, and publication-quality visualizations.
License: MIT
License-File: LICENSE
Keywords: gene expression,evaluation,metrics,single-cell,generative models,benchmarking,gge
Author: GGE Team
Author-email: gge@example.com
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Provides-Extra: full
Provides-Extra: gpu
Requires-Dist: anndata (>=0.8.0)
Requires-Dist: geomloss (>=0.2.1) ; extra == "full" or extra == "gpu"
Requires-Dist: matplotlib (>=3.5.0)
Requires-Dist: numpy (>=1.21.0)
Requires-Dist: pandas (>=1.3.0)
Requires-Dist: pykeops (>=1.4.0) ; extra == "full" or extra == "gpu"
Requires-Dist: scanpy (>=1.9.0)
Requires-Dist: scipy (>=1.7.0)
Requires-Dist: seaborn (>=0.11.0)
Requires-Dist: torch (>=1.9.0)
Requires-Dist: umap-learn (>=0.5.0) ; extra == "full"
Project-URL: Homepage, https://github.com/AndreaRubbi/gge
Project-URL: Repository, https://github.com/AndreaRubbi/gge
Description-Content-Type: text/markdown

# GGE: Generated Genetic Expression Evaluator

[![PyPI version](https://badge.fury.io/py/gge-eval.svg)](https://badge.fury.io/py/gge-eval)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/AndreaRubbi/gge/actions/workflows/test.yml/badge.svg)](https://github.com/AndreaRubbi/gge/actions)

**Comprehensive evaluation of generated gene expression data against real datasets.**

GGE is a modular, object-oriented Python framework for computing metrics between real and generated gene expression datasets stored in AnnData (h5ad) format. It supports condition-based matching, train/test splits, and generates publication-quality visualizations.

## Features

### Metrics
All metrics are computed **per-gene** (returning a vector) and **aggregated**:

| Metric | Description | Direction |
|--------|-------------|-----------|
| **Pearson Correlation** | Linear correlation between expression profiles | Higher is better |
| **Spearman Correlation** | Rank correlation (robust to outliers) | Higher is better |
| **Wasserstein-1** | Earth Mover's Distance (L1) | Lower is better |
| **Wasserstein-2** | Quadratic optimal transport | Lower is better |
| **MMD** | Maximum Mean Discrepancy (kernel-based) | Lower is better |
| **Energy Distance** | Statistical potential energy | Lower is better |

### Visualizations
- **Boxplots & Violin plots**: Metric distributions across conditions
- **Radar plots**: Multi-metric comparison
- **Scatter plots**: Real vs generated expression
- **Embedding plots**: PCA/UMAP of real vs generated data
- **Heatmaps**: Per-gene metric values

### Key Features
- ✅ Condition-based matching (perturbation, cell type, etc.)
- ✅ Train/test split support
- ✅ Per-gene and aggregate metrics
- ✅ Modular, extensible architecture
- ✅ Command-line interface
- ✅ Publication-quality visualizations

## Installation

### Using pip
```bash
pip install gge-eval
```

### Development installation
```bash
pip install -e .
```

### With GPU support (faster distance metrics)
```bash
pip install "gge-eval[gpu]"
```

## Quick Start

### Python API

```python
from gge import evaluate

# Run evaluation
results = evaluate(
    real_path="real_data.h5ad",
    generated_path="generated_data.h5ad",
    condition_columns=["perturbation", "cell_type"],
    split_column="split",  # Optional: for train/test
    output_dir="evaluation_output/"
)

# Access results
print(results.summary())

# Get metric for specific split
test_results = results.get_split("test")
for condition, cond_result in test_results.conditions.items():
    print(f"{condition}: Pearson={cond_result.get_metric_value('pearson'):.3f}")
```

### Command Line

```bash
# Basic usage
gge --real real.h5ad --generated generated.h5ad \
    --conditions perturbation cell_type \
    --output results/

# With split column
gge --real real.h5ad --generated generated.h5ad \
    --conditions perturbation \
    --split-column split \
    --splits test \
    --output results/

# Specify metrics
gge --real real.h5ad --generated generated.h5ad \
    --conditions perturbation \
    --metrics pearson spearman wasserstein_1 mmd \
    --output results/
```

## Expected Data Format

GGE expects AnnData (h5ad) files with:

### Required
- `adata.X`: Gene expression matrix (samples × genes)
- `adata.var_names`: Gene identifiers (must overlap between datasets)
- `adata.obs[condition_columns]`: Columns for matching conditions

### Optional
- `adata.obs[split_column]`: Train/test split indicator

## Output Structure

```
output/
├── summary.json          # Aggregate metrics and metadata
├── results.csv           # Per-condition metrics table
├── per_gene_*.csv        # Per-gene metric values
└── plots/
    ├── boxplot_metrics.png
    ├── violin_metrics.png
    ├── radar_split.png
    ├── scatter_grid.png
    └── embedding_pca.png
```

## Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

## License

This project is licensed under the MIT License. See the LICENSE file for details.
