Metadata-Version: 2.4
Name: datacrafter-ai
Version: 1.0.1
Summary: Datacrafter — AI-based, schema-driven synthetic data generator with a plugin architecture.
Author: Mahalakshmi Shanmuga Sundaram
License: MIT
Keywords: synthetic-data,data-generator,fake-data,yaml,csv,json,xml,datacrafter,ai-based,test-data,plugin
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyYAML>=6.0.1
Requires-Dist: Faker>=20.0.0
Requires-Dist: click>=8.1.7
Requires-Dist: python-dateutil>=2.8.2
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: rstr>=3.2.0
Dynamic: license-file

# Datacrafter

**AI-powered, schema-driven synthetic data generation platform.**

Design datasets using **YAML** or generate them using **natural language**, and produce realistic data in **CSV / JSON / JSONL / XML / Parquet** formats.

---

## ✨ Key Highlights

* Schema-driven generation using YAML
* AI-powered schema creation from natural language prompts
* Formula engine for dynamic, cross-field computations
* Deterministic output using seed control
* Multiple formats: CSV, JSON, JSONL, XML, Parquet
* Plugin architecture for extensibility
* CLI + Python API

---

## 🤖 AI Schema Generation

Generate dataset schemas directly from prompts and save to a file:

```bash
datacrafter ai --prompt "
Generate a banking transactions dataset with:

transaction_id (uuid),
account_number (integer between 100000000000 and 999999999999),
transaction_type (categorical: Debit, Credit, Withdrawal, Deposit, Transfer),
amount (float between 50 and 50000),
currency (categorical: USD, EUR, GBP, INR),
timestamp (datetime between 2023-01-01 and 2025-12-31 in '%Y-%m-%d %H:%M:%S'),
merchant_name (categorical: Amazon, Walmart, Starbucks, Uber, Apple Store, Shell Fuel Station, Best Buy, ATM Withdrawal, Bank Transfer).

Output format: xml.
Return ONLY valid Datacrafter YAML schema.
" --out examples/banking_transactions.yaml
```

✔ AI-generated schema will be saved to the specified output file.

---

## 🧮 Formula Engine

Create dynamic fields using expressions:

```yaml
total_price:
  type: formula
  expr: "price * quantity"
```

Supports:

* Arithmetic operations
* Comparisons and boolean logic
* Ternary expressions
* String concatenation
* Cross-field access

---

## 📦 Installation

```bash
pip install datacrafter-ai
```

**Requirements:** Python 3.9+

---

## 🚀 Quickstart

### 1. Create Schema

```yaml
version: 1
rows: 10

fields:
  price:
    type: float

  quantity:
    type: integer

  total:
    type: formula
    expr: "price * quantity"

output:
  format: csv
  path: ./output/data.csv
```

---

### 2. Generate Data

```bash
datacrafter generate --schema schema.yaml
```

---

## 🧩 Built-in Capabilities

### Providers

* uuid, id.incremental
* integer, float, boolean
* person.*, text.*, string.regex
* datetime, categorical, geo.country
* formula

### Features

* Unique constraints
* Null handling
* Regex validation
* Distributions
* Templating & dependencies
* Cross-field computation

---

## 🖥️ CLI Commands

```bash
datacrafter generate --schema schema.yaml
datacrafter validate --schema schema.yaml
datacrafter list providers
datacrafter list writers
datacrafter init --template minimal
datacrafter ai --prompt "..." --out schema.yaml
```

---

## 🔌 Extensibility

Datacrafter supports plugins for:

* Custom providers
* Custom writers

No core modification required.

---

## ⚙️ AI Configuration

Datacrafter’s AI features support multiple LLM providers and require API credentials.

### 1. Create a `.env` file

Copy the example configuration:

```bash
cp .env.example .env
```

---

### 2. Configure your provider and model

Edit `.env` and choose one provider:

```env
# Choose one provider: openrouter / openai / gemini / groq / deepseek
LLM_PROVIDER=openai

# Choose the model supported by the provider
LLM_MODEL=gpt-4
```

---

### 3. Add the corresponding API key

Provide ONLY the API key for your selected provider:

```env
OPENAI_API_KEY=your_api_key_here
```

Examples for other providers:

```env
OPENROUTER_API_KEY=your_key
GEMINI_API_KEY=your_key
GROQ_API_KEY=your_key
DEEPSEEK_API_KEY=your_key
```

---

### 4. Run AI schema generation

```bash
datacrafter ai --prompt "..." --out schema.yaml
```

---

> ⚠️ Important:
>
> * AI features will NOT work without valid API credentials
> * Only one provider needs to be configured
> * Ensure the selected model is supported by the chosen provider

---

## 📦 Development

```bash
python -m build
twine check dist/*
twine upload dist/*
```

---

## 🔒 License

MIT © 2026 Mahalakshmi Shanmuga Sundaram

---

## 🏢 About

Datacrafter is developed and maintained by **DHS Tech Services**.
