Metadata-Version: 2.4
Name: dataxid-syntheval
Version: 0.1.0
Summary: Synthetic data quality evaluation with Polars-native performance and interactive HTML reports.
Project-URL: Homepage, https://dataxid.com
Project-URL: Repository, https://github.com/dataxid/dataxid-syntheval
Project-URL: Issues, https://github.com/dataxid/dataxid-syntheval/issues
Author-email: DataXID <dev@dataxid.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: data-quality,evaluation,polars,privacy,synthetic-data,synthetic-data-evaluation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: dataxid-profiling>=0.3
Requires-Dist: jinja2>=3.1
Requires-Dist: polars>=1.0
Description-Content-Type: text/markdown

# dataxid-syntheval

[![PyPI version](https://img.shields.io/pypi/v/dataxid-syntheval.svg)](https://pypi.org/project/dataxid-syntheval/)
[![Python versions](https://img.shields.io/pypi/pyversions/dataxid-syntheval.svg)](https://pypi.org/project/dataxid-syntheval/)
[![License](https://img.shields.io/pypi/l/dataxid-syntheval.svg)](https://github.com/dataxid/dataxid-syntheval/blob/main/LICENSE)

Synthetic data quality evaluation — compare original and synthetic datasets with interactive HTML reports.

## Quickstart

```python
import polars as pl
from dataxid_syntheval import SynthEval

original = pl.read_csv("original.csv")
synthetic = pl.read_csv("synthetic.csv")

se = SynthEval(original=original, synthetic=synthetic)
se.to_html("report.html")
```

Programmatic access:

```python
diffs = se.diff
diffs["column_diffs"]          # per-column stat deltas
diffs["alert_diff"]            # new / resolved alerts
diffs["distribution_overlays"] # histogram & frequency overlays
diffs["correlation_diffs"]     # correlation matrix differences
```

## Features

- **Column-level stat comparison** — mean, std, median, min/max, missing %, distinct count and more
- **Alert change detection** — new and resolved data quality alerts between profiles
- **Distribution overlays** — proportion-based histograms and categorical frequency charts for fair comparison across different dataset sizes
- **Correlation matrix diffs** — Pearson, Spearman, Kendall, Cramér's V, Phik
- **Interactive HTML report** — tabbed column comparison, ECharts visualizations, lazy chart rendering
- Built on [dataxid-profiling](https://github.com/dataxid/dataxid-profiling) and Polars

## Installation

```bash
pip install dataxid-syntheval
```

## Contributing

Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for details.

## Links

- [Changelog](CHANGELOG.md)
- [GitHub Issues](https://github.com/dataxid/dataxid-syntheval/issues)
- [dataxid-profiling](https://github.com/dataxid/dataxid-profiling)

## License

[Apache-2.0](LICENSE)
