Metadata-Version: 2.4
Name: feedloop
Version: 1.5.3
Summary: The fastest way to collect human preference data for LLMs
Project-URL: Homepage, https://turingspark.com/tools/feedloop
Project-URL: Documentation, https://turingspark.com/tools/feedloop
Project-URL: Repository, https://github.com/turing-spark/feedloop
Author: Turing Spark
License-Expression: MIT
License-File: LICENSE
Keywords: dpo,human-feedback,llm,preference-data,rlhf
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: fastapi>=0.100
Requires-Dist: pydantic>=2.0
Requires-Dist: uvicorn[standard]>=0.20
Provides-Extra: dev
Requires-Dist: httpx>=0.24; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# feedloop

**The fastest way to collect human preference data for LLMs.**

feedloop is a free developer tool that lets you collect human feedback on LLM outputs — directly from your Python code, with zero configuration. Submit pairs of model responses, review them in a local browser UI, and export a fine-tuning dataset in minutes.

---

## Features

- **Zero setup** — `pip install feedloop` and you're running
- **Local-first** — everything runs on your machine, no cloud account needed
- **Non-blocking SDK** — `feedloop.compare()` returns immediately, your script keeps running
- **Built-in review UI** — side-by-side browser interface with keyboard shortcuts
- **Position randomization** — A/B order is shuffled randomly to prevent left-side bias in human evaluations
- **DPO-ready export** — outputs standard `{"prompt", "chosen", "rejected"}` JSONL
- **Uncertainty filtering** — skip low-uncertainty comparisons automatically, focus human attention where it matters
- **Training script included** — generates a ready-to-run TRL DPO fine-tuning script from your data
- **Session scoping** — each run is isolated; data persists across sessions in SQLite
- **Model agnostic** — works with OpenAI, Anthropic, Hugging Face, Ollama, or any LLM

---

## Use Cases

- **Model comparison** — compare GPT-4o vs Claude, or two versions of your own model
- **Fine-tuning data collection** — build a DPO preference dataset without a labelling platform
- **Evaluation loops** — quickly understand where your model falls short by seeing what humans prefer
- **Active learning** — use uncertainty scores to only review the comparisons that matter most
- **Iterative improvement** — collect feedback → fine-tune → re-run → repeat

---

## Installation

```bash
pip install feedloop
```

Requires Python 3.10+. No other dependencies or accounts needed.

---

## Quick Start

```python
import feedloop

# Start the server (opens browser automatically)
feedloop.start()

# Submit pairs of outputs for review
feedloop.compare(
    prompt="Explain recursion to a 10-year-old.",
    outputs=[
        "Recursion is when a function calls itself...",
        "Imagine you're looking for a book in a library...",
    ],
    metadata={"model_a": "gpt-4o-mini", "model_b": "gpt-4o"},
)

# Rate in the browser, then export
feedloop.export("preferences.jsonl")
```

A browser tab opens at `http://localhost:7856`. Pick the better response with a click or use keyboard shortcuts:

| Key | Action |
|-----|--------|
| `1` | Choose the left response |
| `2` | Choose the right response |
| `S` | Skip — neither response is clearly better |

When you're done, export to a DPO-ready JSONL file.

---

## Full API Reference

### `feedloop.start()`

```python
feedloop.start(
    port=7856,               # port for the local server
    db_path=None,            # SQLite path — defaults to ~/.feedloop/feedloop.db
    open_browser=True,       # auto-open browser on start
    uncertainty_threshold=0.0,  # see Uncertainty Filtering below
)
```

Launches the review server in a background thread. Idempotent — calling it twice reuses the running server. Automatically calls `feedloop.stop()` when your script exits.

---

### `feedloop.compare()`

```python
comparison_id = feedloop.compare(
    prompt="Your prompt here",
    outputs=["Response A", "Response B"],
    uncertainty=None,   # optional float 0.0–1.0
    metadata=None,      # optional dict — stored with the record
)
```

Submits a comparison for human review. Non-blocking — returns a `comparison_id` immediately. The A/B display order is randomized automatically to prevent position bias.

---

### `feedloop.wait()`

```python
# Block until a specific comparison is rated
result = feedloop.wait(comparison_id="abc123", timeout=60)
# → {"prompt": "...", "chosen": "...", "rejected": "...", "auto_skipped": False}

# Block until ALL pending comparisons in the session are rated
result = feedloop.wait(timeout=None)
# → {"completed": 10, "total": 12}

# Returns None on timeout
```

Useful when you want to act on feedback immediately — for example, in a pipeline that fine-tunes on each batch of ratings before generating the next round.

---

### `feedloop.status()`

```python
feedloop.status()
# → {"pending": 3, "completed": 7, "skipped": 1, "auto_skipped": 2, "total": 13}
```

Returns counts for the current session. Useful for progress checks in long-running scripts.

---

### `feedloop.export()`

```python
count = feedloop.export(
    path="preferences.jsonl",   # output file path
    format="dpo",               # only "dpo" supported in v1.x
)
```

Exports all human-rated comparisons from the current session to JSONL. Auto-skipped comparisons are excluded. Returns the number of rows exported.

---

### `feedloop.stop()`

```python
feedloop.stop()
```

Shuts down the background server and closes the database connection. Called automatically via `atexit` when your script exits — but useful to call explicitly in notebooks or long-lived processes where you want to release resources before the session ends.

---

## Uncertainty-Based Filtering

Only review comparisons where your model is unsure — skip the easy ones automatically:

```python
feedloop.start(uncertainty_threshold=0.6)

feedloop.compare(
    prompt="...",
    outputs=[response_a, response_b],
    uncertainty=0.85,  # above threshold → sent to human
)

feedloop.compare(
    prompt="...",
    outputs=[response_a, response_b],
    uncertainty=0.3,   # below threshold → auto-skipped
)
```

The `uncertainty` score is provided by you — feedloop just filters on it. How you compute it depends on your model:

- **Open-weight models** (Llama, Mistral): use token log-probabilities
- **Any API**: sample the same prompt multiple times and measure response disagreement — high variance = high uncertainty
- **Always review everything**: omit `uncertainty` entirely (default behavior)

---

## CLI Usage

You can run feedloop as a standalone review server — useful for reviewing data collected in a previous session:

```bash
feedloop --port 7856 --db ~/.feedloop/feedloop.db
```

Or via Python:

```bash
python -m feedloop --port 7856 --no-browser
```

Options:

| Flag | Default | Description |
|------|---------|-------------|
| `--port` | `7856` | Port to listen on |
| `--db` | `~/.feedloop/feedloop.db` | Path to SQLite database |
| `--no-browser` | off | Don't open browser automatically |

---

## Exported Data Format

```jsonl
{"prompt": "Explain recursion...", "chosen": "Imagine you're looking for a book...", "rejected": "Recursion is when a function calls itself..."}
```

Compatible with [TRL](https://github.com/huggingface/trl) `DPOTrainer`, [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), and any custom pipeline that accepts preference pairs.

---

## Documentation

Full guide, API reference, and examples: **[turingspark.com/tools/feedloop](https://turingspark.com/tools/feedloop)**
