Metadata-Version: 2.4
Name: sdg_hub
Version: 0.9.3
Summary: Synthetic Data Generation
Author-email: Red Hat AI Innovation <abhandwa@redhat.com>
License: Apache-2.0
Project-URL: homepage, https://ai-innovation.team/
Project-URL: source, https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub
Project-URL: issues, https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/issues
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: Apache Software License
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click<9.0.0,>=8.1.7
Requires-Dist: datasets>=4.0.0
Requires-Dist: httpx<1.0.0,>=0.25.0
Requires-Dist: jinja2
Requires-Dist: litellm<=1.82.6,>=1.73.0
Requires-Dist: mlflow-tracing>=3.1.0
Requires-Dist: mcp<2.0.0,>=1.8.0
Requires-Dist: rich
Requires-Dist: pandas
Requires-Dist: pydantic<3.0.0,>=2.0.0
Requires-Dist: python-dotenv<2.0.0,>=1.0.0
Requires-Dist: tenacity!=8.4.0,>=8.3.0
Requires-Dist: tqdm<5.0.0,>=4.66.2
Provides-Extra: dev
Requires-Dist: coverage>=7.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: nbconvert>=7.0.0; extra == "dev"
Requires-Dist: pre-commit<5.0,>=3.0.4; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pytest-env; extra == "dev"
Requires-Dist: pytest-html; extra == "dev"
Requires-Dist: ruff>=0.15.0; extra == "dev"
Provides-Extra: code
Requires-Dist: pydantic-monty<0.1.0,>=0.0.10; extra == "code"
Provides-Extra: integration
Requires-Dist: nest-asyncio; extra == "integration"
Provides-Extra: examples
Requires-Dist: tabulate>=0.9.0; extra == "examples"
Requires-Dist: transformers>=4.37.0; extra == "examples"
Requires-Dist: langchain-text-splitters; extra == "examples"
Requires-Dist: docling>=2.3.0; extra == "examples"
Requires-Dist: scikit-learn; extra == "examples"
Requires-Dist: polars; extra == "examples"
Requires-Dist: matplotlib; extra == "examples"
Requires-Dist: spacy; extra == "examples"
Requires-Dist: nltk; extra == "examples"
Requires-Dist: sentence-transformers; extra == "examples"
Requires-Dist: instructor; extra == "examples"
Requires-Dist: fastapi; extra == "examples"
Requires-Dist: ipykernel; extra == "examples"
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.6; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.29; extra == "docs"
Requires-Dist: griffe-pydantic>=1.1; extra == "docs"
Requires-Dist: mkdocs-llmstxt>=0.2; extra == "docs"
Dynamic: license-file

<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="docs/assets/logo-banner-dark.svg">
    <source media="(prefers-color-scheme: light)" srcset="docs/assets/logo-banner-light.svg">
    <img alt="SDG Hub" src="docs/assets/logo-banner-light.svg" width="320">
  </picture>
</p>
<p align="center"><em>Composable blocks and flows for synthetic data generation</em></p>
<p align="center">
  <a href="https://ai-innovation.team/sdg_hub/"><img src="https://img.shields.io/badge/docs-ai--innovation.team-e8975d?style=flat-square" alt="Docs"></a>
  <a href="https://pypi.org/project/sdg-hub/"><img src="https://img.shields.io/pypi/v/sdg-hub?style=flat-square" alt="PyPI"></a>
  <a href="https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml"><img src="https://img.shields.io/github/actions/workflow/status/Red-Hat-AI-Innovation-Team/sdg_hub/test.yml?style=flat-square&label=tests" alt="Tests"></a>
  <img src="https://img.shields.io/badge/python-3.10%2B-brightgreen?style=flat-square" alt="Python 3.10+">
  <a href="https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/blob/main/LICENSE"><img src="https://img.shields.io/github/license/Red-Hat-AI-Innovation-Team/sdg_hub?style=flat-square" alt="License"></a>
  <a href="https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub"><img src="https://img.shields.io/codecov/c/github/Red-Hat-AI-Innovation-Team/sdg_hub?style=flat-square" alt="Coverage"></a>
  <a href="https://deepwiki.com/Red-Hat-AI-Innovation-Team/sdg_hub"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
</p>

---

<p align="center">
  <img src="docs/assets/demo.gif" alt="SDG Hub Demo" width="700">
</p>

SDG Hub is a Python framework for building synthetic data generation pipelines. Chain LLM, parsing, transform, filtering, and agent blocks into YAML-defined flows -- then generate training data at scale.

## Get Started

```bash
pip install sdg-hub
```

```python
from sdg_hub import FlowRegistry, Flow

# Discover and load a built-in flow
FlowRegistry.discover_flows()
flow = Flow.from_yaml(FlowRegistry.get_flow_path("MCP Server Distillation"))

# Configure and run
flow.set_model_config(model="openai/gpt-4o")
result = flow.generate(dataset)
```

See the [Quick Start](docs/quickstart.md) for a full walkthrough, or browse [all built-in flows](docs/flows/built-in-flows.md).

## Documentation

**[Full documentation at ai-innovation.team/sdg_hub](https://ai-innovation.team/sdg_hub/)**

- [Installation](docs/installation.md) -- setup, optional dependencies, development install
- [Quick Start](docs/quickstart.md) -- end-to-end walkthrough from loading a flow to generating data
- [Core Concepts](docs/concepts.md) -- blocks, flows, registries, and dataset handling
- [Block Reference](docs/blocks/) -- LLM, parsing, transform, filtering, agent, and custom blocks
- [Flow Reference](docs/flows/) -- YAML schema, built-in flows, custom flows
- [API Reference](docs/reference/) -- auto-generated from source
- [Contributing](CONTRIBUTING.md) -- development setup and contribution guidelines

## License

Apache License 2.0 -- see [LICENSE](LICENSE).

---

Built by the [Red Hat AI Innovation Team](https://ai-innovation.team)
