Metadata-Version: 2.4
Name: datarep
Version: 1.1.11
Summary: Your app's data rep — a local agent runtime that retrieves data from any source on behalf of consuming applications.
Project-URL: Homepage, https://datarep-ai.github.io/datarep-docs/
Project-URL: Documentation, https://datarep-ai.github.io/datarep-docs/integration-guide/
Project-URL: Repository, https://github.com/datarep-ai/datarep
Author: thyself-fyi
License: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.42.0
Requires-Dist: bcrypt>=4.2.0
Requires-Dist: browser-cookie3>=0.19.1
Requires-Dist: click>=8.1.0
Requires-Dist: cryptography>=44.0.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: httpx>=0.28.0
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: uvicorn[standard]>=0.34.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-httpx>=0.35.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# datarep

Your app's data rep.

A **rep** is someone you send to go get something on your behalf. You don't tell them how — you tell them what you need, and they figure it out. They show up, assess the situation, adapt to whatever they find, and come back with the goods.

That's what datarep does. Your app says "get me the user's Instagram DMs" and datarep handles it — asks the user how they access the data, extracts session cookies from their browser, calls the API, parses the response, and delivers structured data. No one wrote an Instagram integration. The rep figured one out at runtime.

And like a good rep, it learns. Working code is saved as **recipes** with a full access strategy, so next time it doesn't have to figure it out again. First request takes seconds. Every request after that is instant.

## Why this exists

Every app that needs user data today has to build and maintain its own integrations — or depend on a cloud service that proxies the user's data through someone else's servers. datarep is a different approach: a local agent runtime that synthesizes integrations on demand, runs on the user's machine, and never sends their data anywhere.

There isn't really a category for this yet. It's not a connector (those are pre-built by humans), not an ETL pipeline, not an SDK. It's an autonomous agent that *becomes* a connector — for any source, on the fly.

## Quick start

```bash
pip install datarep
datarep init
export ANTHROPIC_API_KEY="sk-ant-..."
datarep start
```

Register your app and get an API key:

```bash
datarep app register my-app
```

Retrieve data:

```bash
# Via CLI (interactive — agent asks follow-up questions)
datarep get "i want my Instagram DMs"

# Via HTTP API
curl -X POST http://127.0.0.1:7080/get \
  -H "Authorization: Bearer dr_<your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"query": "get my recent iMessages"}'
```

## How it works

datarep uses a **conversational agent** that leads the data retrieval process:

1. **Asks how you access the data** — "How do you usually access your Instagram — in a browser, the app, or something else?" If the answer is vague ("in a browser"), it asks a follow-up to pin down the exact source ("which browser?").
2. **Explores the device** — scans browser profiles, app databases, local files, connected devices, and iPhone backups based on your answer
3. **Extracts credentials programmatically** — pulls session cookies from Safari, Chrome, Firefox, etc. using `browser_cookie3`
4. **Guides physical actions when needed** — if data is on a phone or USB drive, walks you through connecting and backing up one step at a time, automatically detecting when each step completes
5. **Reports stats and gets approval** — tells the user what it found (record count, date range) before extracting
6. **Writes and validates retrieval code** — runs a test extraction (~1000 rows), checks quality, and saves a **recipe**
7. **Streams data on demand** — consuming apps call `GET /data/{recipe_id}` to stream the full dataset as NDJSON, piped directly from the sandbox with no memory limits

Recipes are **fault-tolerant** — per-row error handling ensures a single bad row never kills the stream. Failed rows are logged, and datarep's agent automatically fixes the recipe so the consuming app can retry just the missing rows.

The agent has full read-only filesystem access and open network access. It never asks you to manually extract data it can get programmatically — the only exceptions are authentication (asking you to log into a service) and **device-assisted retrieval** (guiding you step-by-step through connecting a phone or plugging in a USB drive). For physical actions, datarep uses a self-monitoring protocol: it gives you one instruction at a time and automatically detects when each step is done — no need to confirm manually.

## Interfaces

| Interface | Use case |
|-----------|----------|
| **HTTP API** (`localhost:7080`) | Primary interface for all apps. Bearer token auth. Supports conversational sessions. |
| **MCP server** | Native interface for agentic/LLM-powered apps. |
| **CLI** (`datarep`) | Interactive retrieval, setup, source management, debugging. |

## Integration guide

See **[docs/integration-guide.md](docs/integration-guide.md)** for the full walkthrough: API reference, conversational sessions, authentication, MCP setup, recipes, and code examples.

## Development

```bash
pip install -e ".[dev]"
pytest
```
