Metadata-Version: 2.4
Name: synthai-generator
Version: 0.1.0
Summary: A framework for generating synthetic data using LLMs with techniques to ensure diversity and reduce bias
Home-page: https://github.com/biswanathroul/synthai
Author: Biswanath Roul
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: transformers>=4.15.0
Requires-Dist: torch>=1.9.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: tqdm>=4.62.0
Requires-Dist: datasets>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0.0; extra == "dev"
Requires-Dist: black>=21.5b2; extra == "dev"
Requires-Dist: isort>=5.9.1; extra == "dev"
Requires-Dist: flake8>=3.9.2; extra == "dev"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# SynthAI: Synthetic Data Generation Framework

[![PyPI version](https://img.shields.io/pypi/v/synthai.svg)](https://pypi.org/project/synthai/)
[![Python Version](https://img.shields.io/pypi/pyversions/synthai.svg)](https://pypi.org/project/synthai/)
[![License](https://img.shields.io/github/license/biswanathroul/synthai.svg)](https://github.com/biswanathroul/synthai/blob/main/LICENSE)

SynthAI is a lightweight framework for generating high-quality synthetic data using LLMs with techniques to ensure diversity and reduce bias.

## Features

- 🤖 **LLM-Powered Generation**: Create realistic synthetic data using language models
- 🧩 **Domain Adapters**: Specialized components for different data domains (text, tabular, time-series)
- 🔄 **Diversity Enhancement**: Built-in techniques to increase diversity in generated data
- ⚖️ **Bias Reduction**: Methods to detect and mitigate bias in synthetic datasets
- 📊 **Quality Evaluation**: Tools to measure the quality and utility of generated data
- 🚀 **Resource Efficiency**: Optimized to work with lighter models and minimal compute requirements

## Installation

```bash
pip install synthai
```

## Quick Start

```python
from synthai import SyntheticDataGenerator
from synthai.generators import TextGenerator
from synthai.evaluators import DiversityEvaluator

# Initialize a generator
generator = SyntheticDataGenerator(
    generator_type=TextGenerator(model="distilgpt2"),
    domain="customer_reviews"
)

# Generate synthetic data
synthetic_data = generator.generate(
    num_samples=100,
    prompt_template="Write a {sentiment} review for a {product_type}",
    parameters={
        "sentiment": ["positive", "negative", "neutral"],
        "product_type": ["smartphone", "laptop", "headphones"]
    }
)

# Evaluate diversity of the generated data
evaluator = DiversityEvaluator()
diversity_score = evaluator.evaluate(synthetic_data)
print(f"Diversity score: {diversity_score}")

# Save the generated data
generator.save_data(synthetic_data, "synthetic_reviews.csv")
```

## Documentation

For full documentation, visit [synthai.readthedocs.io](https://synthai.readthedocs.io).

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Author

- **Biswanath Roul** - [GitHub](https://github.com/biswanathroul)

## Acknowledgments

Special thanks to the open-source community and the advancements in LLM technology that make this library possible.
