Metadata-Version: 2.4
Name: aigen-cli
Version: 1.0.0
Summary: Zero-API browser-based AI dataset generation. No API keys needed — just a browser and a prompt.
Author-email: Bhrigu Verma <kanuphd21@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/bhriguverma/aigen-cli
Project-URL: Documentation, https://github.com/bhriguverma/aigen-cli#readme
Project-URL: Issues, https://github.com/bhriguverma/aigen-cli/issues
Project-URL: Changelog, https://github.com/bhriguverma/aigen-cli/blob/main/update.md
Keywords: ai,dataset,generation,synthetic-data,llm,gemini,chatgpt,cli,no-code
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: selenium>=4.20.0
Requires-Dist: webdriver-manager>=4.0.2
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.7.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: textual>=0.50.0
Requires-Dist: keyring>=25.0.0
Requires-Dist: cryptography>=42.0.0
Requires-Dist: platformdirs>=4.2.0
Requires-Dist: psutil>=6.0.0
Requires-Dist: notify-py>=0.3.7
Requires-Dist: toml>=0.10.2
Requires-Dist: jsonschema>=4.22.0
Requires-Dist: click-spinner>=0.1.10
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: APScheduler>=3.10.0
Requires-Dist: croniter>=2.0.0
Provides-Extra: ml
Requires-Dist: numpy>=1.24.0; extra == "ml"
Requires-Dist: scikit-learn>=1.3.0; extra == "ml"
Requires-Dist: sentence-transformers>=2.2.0; extra == "ml"
Provides-Extra: playwright
Requires-Dist: playwright>=1.40.0; extra == "playwright"
Provides-Extra: all
Requires-Dist: numpy>=1.24.0; extra == "all"
Requires-Dist: scikit-learn>=1.3.0; extra == "all"
Requires-Dist: sentence-transformers>=2.2.0; extra == "all"
Requires-Dist: playwright>=1.40.0; extra == "all"

# aigen-cli

**Zero-API, browser-based AI dataset generation.**

Turn any web AI (Gemini, ChatGPT, Claude, Perplexity) into a batch generation engine. No API keys. No per-call costs. Just your browser and a config file.

```
$20/mo subscription → unlimited generations
vs
$0.001–$0.024 per API call → ~$5 for 200 items
```

---

## Install

### Via NPM (no Python setup needed)

```bash
npx aigen-cli auth gemini
npx aigen-cli run aigen.yaml

# Or install globally:
npm install -g aigen-cli
aigen auth gemini
```

> The NPM wrapper auto-installs the Python package on first run.

### Via pip

```bash
pip install aigen-cli

# With semantic dedup + quality scoring:
pip install aigen-cli[ml]

# With Playwright backend:
pip install aigen-cli[playwright]

# Everything:
pip install aigen-cli[all]
```

---

## Quick Start

### 1. Authenticate once

```bash
aigen auth gemini
```

A Chrome window opens — sign in, then close it. Your session is saved encrypted.

### 2. Create a config

```bash
aigen init
```

Interactive wizard generates `aigen.yaml`.

### 3. Generate

```bash
aigen run aigen.yaml
```

Headless browsers start, data flows into `output/dataset.json`.

---

## Commands

| Command | Description |
|---------|-------------|
| `aigen auth <platform>` | One-time login. Saves encrypted browser session. Platforms: `gemini`, `chatgpt`, `claude`, `perplexity` |
| `aigen run <config.yaml>` | **Primary command.** Auth-first headless generation from config |
| `aigen generate` | Legacy generation via an already-running Chrome debug session |
| `aigen init` | Interactive wizard to create a generation config |
| `aigen doctor` | System health check (Chrome, Python, keyring, optional deps) |
| `aigen status` | Show generation stats for an output directory |
| `aigen packs` | List built-in domain config packs (medical, legal, ecommerce, code) |
| `aigen push-hf` | Push a completed dataset to HuggingFace Hub |
| `aigen sessions` | List, export, or import browser sessions |
| `aigen schedule` | Add/remove/run cron-scheduled generation jobs |
| `aigen mcp` | Run as an MCP server for AI agent integration |

Run `aigen --help` or `aigen <command> --help` for full options.

---

## Config File

```yaml
project: "My Dataset"
description: "Short description"
target: 100              # Total items to generate
batch_size: 5            # Items per browser request
agents: 2                # Parallel browser tabs
platforms:
  - gemini
  - chatgpt

schema:
  fields: [question_text, expected_answer, difficulty, marks]
  required: [question_text, expected_answer, difficulty]

topic_pool:
  - chapter: "Chapter 1"
    topic: "Algebra"
    question_type: short_answer
    difficulty: easy
    marks: 3

output:
  format: json           # json | csv | jsonl
  path: output
  filename: dataset.json

# Optional: push to HuggingFace after generation
huggingface:
  repo_id: "your-username/dataset-name"
  private: false

# Optional: quality + compliance
quality:
  enabled: true          # requires pip install aigen-cli[ml]
  mode: heuristic
  threshold: 6.0

compliance:
  pii_detection: true
  auto_redact: true
```

---

## Domain Packs

Ready-made configs for common use cases:

```bash
aigen packs                          # list available packs
aigen generate --pack medical_mcq    # run a built-in pack
```

Built-in packs: `medical_mcq`, `legal_qa`, `ecommerce`, `code_review`

---

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│                      aigen-cli                           │
│                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
│  │ Gemini   │  │ ChatGPT  │  │  Claude  │  │Perplexity│ │
│  │  Tab     │  │   Tab    │  │   Tab    │  │   Tab    │ │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘ │
│       └──────────────┴─────────────┴──────────────┘       │
│                          │                                │
│              ┌───────────▼───────────┐                    │
│              │    BrowserPool        │                    │
│              │  (auth-first agents)  │                    │
│              └───────────┬───────────┘                    │
│                          │                                │
│              ┌───────────▼───────────┐                    │
│              │  GenerationEngine     │                    │
│              └───────────┬───────────┘                    │
│                          │                                │
│    ┌─────────────────────┼─────────────────────┐         │
│    │                     │                     │         │
│ ┌──▼──┐  ┌───────────┐ ┌▼────────┐  ┌────────▼──┐      │
│ │Parse│  │  Validate  │ │ Dedup   │  │  Output   │      │
│ │JSON │  │  (schema)  │ │ Engine  │  │ Writers   │      │
│ └─────┘  └───────────┘ └─────────┘  └───────────┘      │
└─────────────────────────────────────────────────────────┘
```

---

## Why This Exists

API costs scale linearly with usage. Generating 1000 items via API can cost $20–50. The same models via web interface are included in a flat $20/month subscription.

aigen-cli automates the web interface so you get:
- **Unlimited generations** — rate-limited by platform anti-bot, not your billing
- **Zero API keys** — just log in once like a normal user
- **Full transparency** — every raw response is saved for audit
- **No vendor lock-in** — swap platforms instantly via config

---

## License

MIT
