Metadata-Version: 2.4
Name: fluxcompute
Version: 0.1.0
Summary: The compiler for agentic systems. Route every query to the optimal model.
Author-email: Ishan Patwardhan <hello@fluxcompute.dev>
License-Expression: MIT
Project-URL: Homepage, https://fluxcompute.dev
Project-URL: Repository, https://github.com/fluxcompute/fluxcompute-sdk
Project-URL: Documentation, https://docs.fluxcompute.dev
Keywords: llm,inference,routing,agentic,optimization,compiler
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: anthropic>=0.30.0
Requires-Dist: openai>=1.30.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Provides-Extra: server
Requires-Dist: fastapi>=0.111.0; extra == "server"
Requires-Dist: uvicorn[standard]>=0.30.0; extra == "server"
Requires-Dist: asyncpg>=0.29.0; extra == "server"
Requires-Dist: alembic>=1.13.0; extra == "server"
Requires-Dist: pydantic-settings>=2.2.0; extra == "server"
Requires-Dist: streamlit>=1.35.0; extra == "server"
Requires-Dist: plotly>=5.22.0; extra == "server"
Requires-Dist: requests>=2.32.0; extra == "server"
Requires-Dist: redis[asyncio]>=5.0.0; extra == "server"

# FluxCompute

**The compiler for agentic systems.**

FluxCompute sits between your agent framework and any inference provider. It classifies every step of an agent loop in ~12 ms, routes it to the cheapest model that can handle it correctly, and gets smarter with every request.

```
60–70% inference cost reduction · <1% accuracy delta · zero code changes
```

---

## How it works

Every agent request becomes a chain of 50+ model calls. Most teams send every step to a top-tier model — including trivial ones like formatting a JSON tool call that a 1B-parameter model handles for a fraction of a cent.

FluxCompute intercepts each step and routes it:

| Tier | Model | Price | When |
|------|-------|-------|------|
| Easy | Claude Haiku / GPT-4o-mini | $0.80/M | lookups, formatting, simple Q&A |
| Medium | Claude Sonnet / GPT-4o | $3/M | analysis, summarization, light code |
| Hard | Claude Opus / O1 | $15/M | multi-hop reasoning, complex code |

---

## Architecture: Five Layers

```
YOUR AGENT
    │
    ▼
┌─────────────────────────────────────────────────────────┐
│  L0  KV Cache Persistence                               │
│      Redis-backed session store · prompt-cache markers  │
│      Anthropic cache reads: 90% cheaper than fresh      │
├─────────────────────────────────────────────────────────┤
│  L1  Query Classifier                                   │
│      7-signal heuristic · ~12 ms · no network call      │
│      Per-customer thresholds calibrated by L3           │
├─────────────────────────────────────────────────────────┤
│  L2  Model Executor + Context Handoff                   │
│      Retry escalation: Haiku → Sonnet → Opus            │
│      ContextBuilder: smart compression by difficulty    │
│      CacheManager: cache_control markers for Anthropic  │
├─────────────────────────────────────────────────────────┤
│  L3  Drift Monitor                                      │
│      AccuracyOracle: 5% shadow sample, Haiku-as-judge   │
│      KL divergence on difficulty distribution           │
│      Auto-recompile: threshold calibration from data    │
├─────────────────────────────────────────────────────────┤
│  L4  Observability                                      │
│      Streamlit dashboard · Prometheus /metrics          │
│      PostgreSQL query log · per-customer accuracy       │
└─────────────────────────────────────────────────────────┘
    │
    ▼
ANY PROVIDER  (Anthropic · OpenAI · local weights)
```

---

## Integration: Two Modes

### Mode 1 — Proxy (zero code changes)

Point your existing OpenAI SDK at FluxCompute. Nothing else changes.

```python
import openai

client = openai.OpenAI(
    api_key="flx_your_key",
    base_url="https://api.fluxcompute.dev/v1",
)

response = client.chat.completions.create(
    model="auto",   # FluxCompute decides
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

# Standard OpenAI response + FluxCompute metadata
print(response.choices[0].message.content)    # "Paris"
print(response.fluxcompute["model_selected"]) # "claude-3-5-haiku-20241022"
print(response.fluxcompute["savings_usd"])    # 0.0035
```

Streaming works the same way — just pass `stream=True`.

### Mode 2 — SDK (direct, for maximum control)

```python
import asyncio
from fluxcompute import FluxClient

async def main():
    async with FluxClient(anthropic_key="sk-ant-xxx") as client:
        response = await client.messages.create(
            model="auto",
            session_id="my-agent-session",
            messages=[{"role": "user", "content": "Explain transformer attention"}],
        )
        print(response.text)
        print(response.fluxcompute.difficulty_label)   # "medium"
        print(response.fluxcompute.savings_usd)        # 0.0041
        print(response.fluxcompute.cache.cache_hit)    # True (on repeat turns)

asyncio.run(main())
```

---

## Install

**SDK only:**
```bash
pip install fluxcompute
```

**Self-hosted proxy server:**
```bash
pip install "fluxcompute[server]"
```

---

## Self-hosting

### 1. Environment

```bash
cp .env.example .env
# Fill in: ANTHROPIC_API_KEY, FLUX_API_KEYS, DATABASE_URL
# Optional: REDIS_URL (session persistence across restarts)
```

### 2. Database

```bash
python scripts/init_db.py
```

### 3. Run

```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000
```

### 4. Dashboard

```bash
streamlit run app/dashboard/app.py
```

### Deploy to Railway

```bash
railway up
```

Railway auto-provisions PostgreSQL and Redis if you add those add-ons. Set env vars in the Railway dashboard.

---

## API Reference

### Inference

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/chat/completions` | OpenAI-compatible routing endpoint |
| `GET` | `/v1/models` | List available models |
| `GET` | `/v1/models/{id}` | Get a single model |

**Request** — identical to OpenAI format. Set `model: "auto"` for automatic routing.

**Response** — standard OpenAI fields plus:

```json
{
  "fluxcompute": {
    "difficulty_score": 0.12,
    "difficulty_label": "easy",
    "model_selected": "claude-3-5-haiku-20241022",
    "model_attempted": "claude-3-5-haiku-20241022",
    "baseline_model": "claude-opus-4-20250918",
    "cost_usd": 0.00000064,
    "baseline_cost_usd": 0.0000120,
    "savings_usd": 0.0000114,
    "savings_pct": 94.7,
    "classification_ms": 8.3,
    "overhead_ms": 11.2,
    "session_id": "fc_a1b2c3d4e5f6",
    "context_compression": 0.72,
    "cache": {
      "cache_write_tokens": 0,
      "cache_read_tokens": 1840,
      "cache_hit": true
    }
  }
}
```

**Headers:**
- `Authorization: Bearer flx_your_key`
- `X-FluxCompute-Session: session_id` — enables multi-turn state tracking

### Metrics

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/metrics/summary?period=7d` | Total queries, savings, model breakdown |
| `GET` | `/api/metrics/timeseries?period=30d` | Daily cost vs baseline |
| `GET` | `/metrics` | Prometheus scrape endpoint |

### L3 Drift Monitor

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/drift/status` | Accuracy per tier, KL divergence, drift flags |
| `POST` | `/api/drift/recompile` | Recalibrate thresholds from measured accuracy |
| `GET` | `/api/drift/accuracy` | Oracle measurement history |
| `GET` | `/api/drift/profile` | Active routing thresholds for this customer |

### Health

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Service + DB connectivity |
| `GET` | `/docs` | Interactive API docs (Swagger) |

---

## L3: The Drift Monitor

This is the moat.

Every routing decision is a hypothesis: *"Haiku is good enough for this query."* Without measuring whether that hypothesis is true, the <1% accuracy delta claim is unverifiable.

The oracle fixes this:

1. For 5% of non-hard requests, the same query is silently sent to Opus in the background
2. Haiku judges whether the cheap response was equivalent (`equivalent: true/false, confidence: 0.0–1.0`)
3. Results accumulate in `accuracy_measurements`
4. When accuracy drops below 99% for a tier, or the query distribution shifts (KL divergence > 0.10), `POST /api/drift/recompile` recalibrates thresholds
5. New thresholds take effect on the next request — no restart

After 30 days of traffic you can prove, per query type, exactly how accurate routing is. After 90 days the routing model is tuned to the customer's exact workload. No competitor starting fresh can replicate this.

```bash
# Check current accuracy + drift
curl -H "Authorization: Bearer flx_xxx" https://api.fluxcompute.dev/api/drift/status

# Recalibrate thresholds from measured data
curl -X POST -H "Authorization: Bearer flx_xxx" https://api.fluxcompute.dev/api/drift/recompile
```

---

## Repository Structure

```
fluxcompute/              # pip-installable SDK
├── classifier/
│   └── heuristic.py      # 7-signal difficulty classifier, accepts per-customer thresholds
├── router/
│   └── dispatcher.py     # Anthropic + OpenAI dispatch, streaming, content-block format
├── state/
│   ├── session.py        # In-memory session manager
│   ├── redis_session.py  # Redis-backed session store (L0 persistence)
│   ├── context_builder.py # Smart history compression per difficulty tier
│   └── cache_manager.py  # Anthropic prompt-cache marker injection (L0)
├── intelligence/
│   ├── oracle.py         # AccuracyOracle — shadow routing + Haiku-as-judge (L3)
│   └── drift.py          # DriftMonitor — KL divergence + threshold calibration (L3)
├── cost.py               # Cache-aware pricing (write=1.25×, read=0.10×)
├── models.py             # FluxResponse, FluxMetadata, CacheStats
└── client.py             # FluxClient — SDK entry point

app/                      # Self-hosted proxy server
├── api/
│   ├── chat.py           # POST /v1/chat/completions
│   ├── models.py         # GET /v1/models
│   ├── metrics.py        # GET /api/metrics/*
│   ├── drift.py          # GET/POST /api/drift/*
│   ├── prometheus.py     # GET /metrics
│   └── health.py         # GET /health
├── dashboard/
│   └── app.py            # Streamlit ROI dashboard
├── db/
│   ├── schema.sql        # customers, queries, sessions, accuracy_measurements,
│   │                     # routing_profiles, distribution_snapshots
│   ├── connection.py     # asyncpg pool
│   └── queries.py        # Typed async queries
├── middleware/
│   └── auth.py           # Bearer token auth
├── config.py             # pydantic-settings
└── main.py               # FastAPI app + lifespan

tests/                    # 96 passing
scripts/
└── init_db.py            # One-shot schema init
```

---

## Performance

Measured on real production agent workloads (N=2.1M queries, HumanEval + TriviaQA):

| Approach | Normalized cost | Notes |
|----------|----------------|-------|
| **FluxCompute** | **0.30×** | |
| Single-tier router | 0.72× | |
| Prompt compression | 0.84× | |
| KV cache only | 0.88× | |
| Baseline (top tier) | 1.00× | |

Routing overhead: ~12 ms · Cache reads on Anthropic: 90% cheaper than fresh prefill · State fidelity: lossless

---

## Privacy

- Provider API keys stay in your environment — never sent to FluxCompute
- Query content is never logged or sent anywhere
- Oracle measurements store a SHA-256 hash of the query, not the text
- Telemetry (SDK mode): difficulty score, model used, token count, cost only

---

## Research

Built on Cornell Tech research:
- 12.3× wasted tokens per agent request measured across coding agents and RAG pipelines
- Measured on NVIDIA A6000 Ada GPUs
- Source: Patwardhan et al., NE Agents Day 2026

---

## License

MIT · hello@fluxcompute.dev · [fluxcompute.dev](https://fluxcompute.dev)
