Metadata-Version: 2.4
Name: protegrity-synthetic-data-sdk
Version: 2.0.0
Summary: Python SDK for Synthetic Data REST API - remote synthetic data generation
Project-URL: Homepage, https://www.protegrity.com/capabilities/synthetic-data
Project-URL: Documentation, https://docs.protegrity.com/
Author-email: Protegrity <info@protegrity.com>
License: MIT
Keywords: api-client,data-generation,rest-client,sdk,synthetic-data
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.14,>=3.11
Requires-Dist: httpx>=0.28.1
Requires-Dist: numpy>=2.0.2
Requires-Dist: pandas>=2.3.3
Provides-Extra: benchmark
Requires-Dist: psutil>=6.0.0; extra == 'benchmark'
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0; extra == 'parquet'
Description-Content-Type: text/markdown

# Synthetic Data SDK

Python client library for the Synthetic Data REST API. Generate high-fidelity synthetic data remotely without running heavy ML models locally.

## Requirements

- Running Synthetic Data Server (for API endpoint)

## Installation

```bash
# Using pip
pip install -e .

# Using uv, by adding dependency
uv sync
```

## Features

- **Remote synthesis**: No local GPU or heavy dependencies required
- **Multiple models**: VineCopula, TabularGAN, TabDiff, SMOTE
- **Multi-table support**: Preserve foreign key relationships
- **Privacy evaluation**: Built-in privacy attack simulations (MIA, SARP, linkage attacks)
- **Causal fidelity evaluation**: Validate treatment effects, decision consistency, fairness preservation
- **Pandas integration**: Works seamlessly with DataFrames
- **Quality evaluation**: Built-in metrics for fidelity assessment (KS test, correlation, TSTR/TRTR)
- **Model inspection**: Access model metadata and configuration via `summary()`
- **Unified API**: Consistent interface across all synthesizers

## Quick Start

### Synthetic Data Generation

```python
from synthetic_data_sdk import RemoteVineCopula
import pandas as pd

# Load your data
data = pd.read_csv("customers.csv")

# Initialize remote synthesizer
synth = RemoteVineCopula(
    endpoint="http://localhost:8000",
    model_version="customer-synth-v1"
)

# Fit model (runs on remote server)
synth.fit(data)

# Generate synthetic data
synthetic_data = synth.transform(n=1000)

# Inspect model configuration
summary = synth.summary()
print(f"Fitted: {summary['fitted']}")
print(f"Continuous cols: {summary['n_continuous']}")

# Evaluate synthetic data quality
metrics = synth.evaluate(
    real_data=data,
    synthetic_data=synthetic_data,
    categorical_cols=['city'],
    target_col='churn',
    task_type='classification'
)
print(f"Correlation Error: {metrics['correlation_error']:.3f}")
print(f"TSTR Score: {metrics['tstr_score']:.3f}")
```

### Privacy Risk Evaluation

```python
from synthetic_data_sdk import PrivacyEvaluator
import pandas as pd

# Load datasets
train_data = pd.read_csv("train.csv")
test_data = pd.read_csv("test.csv")
synthetic_data = pd.read_csv("synthetic.csv")

# Initialize privacy evaluator
privacy = PrivacyEvaluator(endpoint="http://localhost:8000")

# Run privacy attack simulations
results = privacy.evaluate(
    train_real_data=train_data,
    test_real_data=test_data,
    synthetic_data=synthetic_data,
    sensitive_columns=['ssn', 'salary', 'diagnosis']
)

# Check results
print(f"Overall Privacy Risk: {results['overall_risk']}")
print(f"Successful Attacks: {results['summary']['successful_attacks']}/{results['summary']['total_attacks']}")

# Review individual attacks
for attack in results['attacks']:
    print(f"{attack['attack_type']}: {attack['risk_level']} risk")
```

### Causal Fidelity Evaluation

```python
from synthetic_data_sdk import CausalEvaluator
import pandas as pd

# Load datasets
real_data = pd.read_csv("real.csv")
synthetic_data = pd.read_csv("synthetic.csv")

# Initialize causal evaluator
causal = CausalEvaluator(endpoint="http://localhost:8000")

# Evaluate treatment effect preservation
results = causal.evaluate(
    real_data=real_data,
    synthetic_data=synthetic_data,
    treatment_col='treatment_assigned',
    outcome_col='outcome_value',
    covariates=['age', 'income'],
    evaluation_type='treatment_effect'
)

# Check results
print(f"Overall Preserved: {results['overall_preserved']}")
print(f"ATE Real: {results['evaluations'][0]['metrics']['ate_real']:.3f}")
print(f"ATE Synthetic: {results['evaluations'][0]['metrics']['ate_synth']:.3f}")
print(f"Relative Error: {results['evaluations'][0]['metrics']['ate_relative_error']:.1%}")
```

## Available Synthesizers

| Class | Model Type | Best For | Methods |
|-------|-----------|----------|----------|
| `RemoteVineCopula` | Vine Copula | General tabular data | fit, transform, fit_transform, summary, evaluate |
| `RemoteMultiTableVineCopula` | Multi-table Copula | Relational databases | fit, transform, fit_transform, summary, evaluate, validate_relationships, relational_score, get_table_order |
| `RemoteTabularGAN` | GAN | Complex distributions | fit, transform, fit_transform, summary, evaluate |
| `RemoteTabDiff` | Diffusion Model | High-fidelity synthesis | fit, transform, fit_transform, summary, evaluate |
| `RemoteSMOTE` | Oversampling | Imbalanced datasets | fit, transform, fit_transform, summary, evaluate |
| `PrivacyEvaluator` | Privacy Attacks | Privacy risk assessment | evaluate |
| `CausalEvaluator` | Causal Fidelity | Treatment effects, fairness | evaluate |
| `CertificationClient` | Quality Grading | A+ to F scoring | certify |
| `SynthesisClient` | Low-level HTTP client | Direct API interaction | request, get_job_status |

## Documentation

API reference is available via docstrings, also refer to [Online Documentation](https://docs.protegrity.com/)

## Support

- **Issues**: Report bugs and request features via issue tracker
- **Email**: info@protegrity.com

## License

MIT License
