Metadata-Version: 2.4
Name: dash-synthetic
Version: 0.1.1
Summary: Synthetic data generation for Databricks — with a built-in notebook UI
Project-URL: Homepage, https://github.com/darshan-innovation/dash-synthetic
Author-email: Darshan Shah <darshan.innovation@icloud.com>
License: Apache-2.0
Keywords: databricks,delta,pyspark,synthetic-data
Requires-Python: >=3.9
Requires-Dist: dash-uis>=0.1.0
Requires-Dist: ipywidgets>=8.0
Requires-Dist: numpy>=1.24
Provides-Extra: dev
Requires-Dist: hatch; extra == 'dev'
Requires-Dist: pdoc; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# Dashsynthetic — Databricks Library

[![CI](https://github.com/darshan-innovation/dash-synthetic/actions/workflows/ci.yml/badge.svg)](https://github.com/darshan-innovation/dash-synthetic/actions)
[![PyPI](https://img.shields.io/pypi/v/dash-synthetic)](https://pypi.org/project/dash-synthetic/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)

Part of the **[Dashlibs](https://github.com/darshan-innovation)** suite — Databricks libraries built for business users.

## Installation

```bash
%pip install dash-synthetic
```

## Quick Start

```python
import dashsynthetic
dashsynthetic.launch()   # Opens interactive UI in your Databricks notebook
```

The UI has two tabs:
- **Single Table** — profile a source table/DataFrame/SQL query and generate synthetic data from it.
- **Multi-Table Relationships** — define multiple tables, their primary keys, foreign keys, and
  master data columns (e.g. currency/country codes); the tool figures out the dependency order and
  generates every table with referentially valid foreign keys.

## Python API

```python
from dashsynthetic import RelationshipGraph, MultiTableGenerator

graph = RelationshipGraph()
graph.add_table("Customer", table="catalog.schema.dim_customer", primary_key="customer_id")
graph.add_table("Account", table="catalog.schema.fact_account", primary_key="account_id",
                master_data_columns=["currency_code"])
graph.add_foreign_key("Account", "customer_id", "Customer", "customer_id")

gen = MultiTableGenerator(graph)
gen.configure_table("Customer", n_rows=5000)
gen.configure_table("Account", n_rows=20000, output_table="catalog.schema.syn_account")
results = gen.run()   # {"Customer": df, "Account": df}, generated in dependency order
```

## Part of Dashlibs

| Library | Purpose |
|---|---|
| dash-dq | Data Quality |
| dash-synthetic | Synthetic Data Generation |
| dash-ml | ML Model Monitoring |
| dash-ingest | Data Ingestion |
| dash-gov | Data Governance |
| dash-relate | Ontology & Lineage for AI |

## License

Apache 2.0
