Metadata-Version: 2.4
Name: microplex
Version: 0.1.0
Summary: Microdata synthesis and reweighting using normalizing flows
Project-URL: Homepage, https://github.com/CosilicoAI/microplex
Project-URL: Documentation, https://cosilicoai.github.io/microplex
Project-URL: Repository, https://github.com/CosilicoAI/microplex
Author-email: Cosilico <hello@cosilico.ai>
License-Expression: MIT
License-File: LICENSE
Keywords: imputation,microdata,normalizing-flows,privacy,survey-data,synthesis
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.10
Requires-Dist: torch>=2.0
Provides-Extra: all
Requires-Dist: jupyter-book>=0.15; extra == 'all'
Requires-Dist: matplotlib>=3.7; extra == 'all'
Requires-Dist: mypy>=1.0; extra == 'all'
Requires-Dist: myst-nb>=0.17; extra == 'all'
Requires-Dist: pytest-cov>=4.0; extra == 'all'
Requires-Dist: pytest>=7.0; extra == 'all'
Requires-Dist: ruff>=0.1; extra == 'all'
Requires-Dist: scikit-learn>=1.3; extra == 'all'
Requires-Dist: sdv>=1.0; extra == 'all'
Requires-Dist: seaborn>=0.12; extra == 'all'
Requires-Dist: sphinx-autodoc-typehints>=1.23; extra == 'all'
Requires-Dist: sphinx>=6.0; extra == 'all'
Provides-Extra: benchmark
Requires-Dist: matplotlib>=3.7; extra == 'benchmark'
Requires-Dist: scikit-learn>=1.3; extra == 'benchmark'
Requires-Dist: sdv>=1.0; extra == 'benchmark'
Requires-Dist: seaborn>=0.12; extra == 'benchmark'
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Provides-Extra: docs
Requires-Dist: jupyter-book>=0.15; extra == 'docs'
Requires-Dist: myst-nb>=0.17; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints>=1.23; extra == 'docs'
Requires-Dist: sphinx>=6.0; extra == 'docs'
Description-Content-Type: text/markdown

# microplex

Microdata synthesis and reweighting using normalizing flows.

[![PyPI](https://img.shields.io/pypi/v/microplex.svg)](https://pypi.org/project/microplex/)
[![Tests](https://github.com/CosilicoAI/microplex/actions/workflows/test.yml/badge.svg)](https://github.com/CosilicoAI/microplex/actions/workflows/test.yml)
[![Docs](https://github.com/CosilicoAI/microplex/actions/workflows/docs.yml/badge.svg)](https://cosilicoai.github.io/microplex)

## Overview

`microplex` creates rich, calibrated microdata through:

- **Conditional relationships**: Generate target variables given demographics
- **Zero-inflated distributions**: Handle variables that are 0 for many observations
- **Joint correlations**: Preserve relationships between target variables
- **Hierarchical structures**: Keep household/firm compositions intact

## Installation

```bash
pip install microplex
```

## Quick Start

```python
from microplex import Synthesizer
import pandas as pd

# Load training data with known target variables
training_data = pd.read_csv("survey_with_income.csv")

# Initialize synthesizer
synth = Synthesizer(
    target_vars=["income", "expenditure", "savings"],
    condition_vars=["age", "education", "region"],
)

# Fit on training data
synth.fit(training_data, weight_col="weight", epochs=100)

# Generate synthetic targets for new demographics
new_demographics = pd.read_csv("demographics_only.csv")
synthetic = synth.generate(new_demographics)
```

## Why `microplex`?

| Feature | microplex | CT-GAN | TVAE | synthpop |
|---------|-------|--------|------|----------|
| Conditional generation | ✅ | ❌ | ❌ | ❌ |
| Zero-inflation handling | ✅ | ❌ | ❌ | ⚠️ |
| Exact likelihood | ✅ | ❌ | ❌ | N/A |
| Stable training | ✅ | ⚠️ | ✅ | ✅ |
| Preserves source structure | ✅ | ❌ | ❌ | ⚠️ |

### Use Cases

- **Survey enhancement**: Impute income variables from tax data onto census demographics
- **Privacy-preserving synthesis**: Generate synthetic data that preserves statistical properties without copying real records
- **Data fusion**: Combine variables from multiple surveys with different sample designs
- **Missing data imputation**: Fill in missing values conditioned on observed variables

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│                      Synthesizer                         │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Training:                                               │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ Training │───▶│ Transformer  │───▶│ Normalizing  │  │
│  │   Data   │    │ (log, std)   │    │    Flow      │  │
│  └──────────┘    └──────────────┘    └──────────────┘  │
│                                                          │
│  Generation:                                             │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ Context  │───▶│ Zero + Flow  │───▶│  Inverse     │  │
│  │  Vars    │    │   Sampling   │    │  Transform   │  │
│  └──────────┘    └──────────────┘    └──────────────┘  │
│                                                          │
└─────────────────────────────────────────────────────────┘
```

## Documentation

Full documentation at [cosilicoai.github.io/microplex](https://cosilicoai.github.io/microplex)

- [Tutorial](https://cosilicoai.github.io/microplex/tutorial.html)
- [API Reference](https://cosilicoai.github.io/microplex/api.html)
- [Benchmarks](https://cosilicoai.github.io/microplex/benchmarks.html)

## Benchmarks

See [benchmarks/](benchmarks/) for comparisons against:

- **CT-GAN**: Conditional Tabular GAN (from SDV)
- **TVAE**: Tabular VAE (from SDV)
- **Copulas**: Gaussian copula synthesis (from SDV)
- **synthpop**: CART-based synthesis (R package, via rpy2)

## Citation

```bibtex
@software{microplex2024,
  author = {Cosilico},
  title = {microplex: Microdata synthesis and reweighting using normalizing flows},
  year = {2024},
  url = {https://github.com/CosilicoAI/microplex}
}
```

## License

MIT License - see [LICENSE](LICENSE) for details.
