Metadata-Version: 2.1
Name: synthetic-eval
Version: 0.0.8
Summary: Package for Evaluation of Synthetic Tabular Data Quality
Home-page: https://github.com/an-seunghwan/synthetic_eval
Author: Seunghwan An
Author-email: dpeltms79@gmail.com
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: backports.tarfile==1.2.0
Requires-Dist: certifi==2024.7.4
Requires-Dist: charset-normalizer==3.3.2
Requires-Dist: docutils==0.21.2
Requires-Dist: filelock==3.15.4
Requires-Dist: fsspec==2024.6.1
Requires-Dist: idna==3.7
Requires-Dist: importlib_metadata==8.2.0
Requires-Dist: jaraco.classes==3.4.0
Requires-Dist: jaraco.context==5.3.0
Requires-Dist: jaraco.functools==4.0.1
Requires-Dist: Jinja2==3.1.4
Requires-Dist: joblib==1.4.2
Requires-Dist: keyring==25.2.1
Requires-Dist: markdown-it-py==3.0.0
Requires-Dist: MarkupSafe==2.1.5
Requires-Dist: mdurl==0.1.2
Requires-Dist: more-itertools==10.3.0
Requires-Dist: mpmath==1.3.0
Requires-Dist: networkx==3.3
Requires-Dist: nh3==0.2.18
Requires-Dist: numpy==1.26.4
Requires-Dist: pandas==2.2.2
Requires-Dist: pip==24.0
Requires-Dist: pkginfo==1.10.0
Requires-Dist: Pygments==2.18.0
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: pytz==2024.1
Requires-Dist: readme_renderer==44.0
Requires-Dist: requests==2.32.3
Requires-Dist: requests-toolbelt==1.0.0
Requires-Dist: rfc3986==2.0.0
Requires-Dist: rich==13.7.1
Requires-Dist: scikit-learn==1.5.1
Requires-Dist: scipy==1.14.0
Requires-Dist: setuptools==69.5.1
Requires-Dist: six==1.16.0
Requires-Dist: sympy==1.13.1
Requires-Dist: threadpoolctl==3.5.0
Requires-Dist: torch==2.2.2
Requires-Dist: tqdm==4.66.4
Requires-Dist: twine==5.1.1
Requires-Dist: typing_extensions==4.12.2
Requires-Dist: tzdata==2024.1
Requires-Dist: urllib3==2.2.2
Requires-Dist: wheel==0.43.0
Requires-Dist: zipp==3.19.2

# Synthetic-Eval

**Synthetic-Eval** is a package for the comprehensive evaluation of synthetic tabular datasets.

### 1. Installation
Install using pip:
```
pip install synthetic-eval
```

### 2. Supported Metrics
- **Statistical Fidelity**
  1. KL-Divergence (`KL`)
  2. Goodness-of-Fit (Kolmogorov-Smirnov test & Chi-Squared test) (`GoF`)
  3. Maximum Mean Discrepancy (`MMD`)
  4. Cramer-Wold Distance (`CW`)
  5. (naive) $\alpha$-precision & $\beta$-recall (`alpha_precision`, `beta_recall`)
- **Machine Learning Utility** (classification task) 
  1. Accuracy (`base_cls`, `syn_cls`)
  2. Model Selection Performance (`model_selection`)
  3. Feature Selection Performance (`feature_selection`)
- **Privacy Preservation**
  1. $k$-Anonymization (`Kanon_base`, `Kanon_syn`)
  2. $k$-Map (`KMap`)
  3. Distance to Closest Record (`DCR_RS`, `DCR_RR`, `DCR_SS`)
  4. Attribute Disclosure (`AD`)

### 3. Usage
```python
from synthetic_eval import evaluation
evaluation.evaluate # function for evaluating synthetic data quality
```
- See [example.ipynb](example.ipynb) for detailed example and its results with `loan` dataset.
  - Link for download `loan` dataset: [https://www.kaggle.com/datasets/teertha/personal-loan-modeling](https://www.kaggle.com/datasets/teertha/personal-loan-modeling)

#### Example
```python
"""import libraries"""
import pandas as pd
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

"""the ground-truth (training, test) and synthetic dataset"""
data = pd.read_csv('./loan.csv') 
# len(data) # 5,000
train = data.iloc[:2000]
test = data.iloc[2000:4000]
syndata = data.iloc[4000:]

"""specify column types"""
continuous_features = [
    'Age',
    'Experience',
    'Income', 
    'CCAvg',
    'Mortgage',
]
categorical_features = [
    'Family',
    'Personal Loan',
    'Securities Account',
    'CD Account',
    'Online',
    'CreditCard'
]
target = 'Personal Loan' # machine learning utility target column

"""load Synthetic-Eval"""
from synthetic_eval import evaluation
results = evaluation.evaluate(
    syndata, train, test, 
    target, continuous_features, categorical_features, device
)

"""print results"""
for x, y in results._asdict().items():
    print(f"{x}: {y:.3f}")
```

### 3. References
  - https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/metrics
  - https://github.com/HLasse/data-centric-synthetic-data
