Metadata-Version: 2.4
Name: breviadev
Version: 0.1.0
Summary: Local proxy that optimizes LLM context by 55-97%. Same quality, fraction of the cost.
Project-URL: Homepage, https://brevia.dev
Project-URL: Documentation, https://brevia.dev/docs
Project-URL: Repository, https://github.com/brevia-dev/brevia
Project-URL: Issues, https://github.com/brevia-dev/brevia/issues
Author-email: Brevia <hello@brevia.dev>
License-Expression: MIT
License-File: LICENSE
Keywords: anthropic,claude,cost-reduction,llm,optimization,proxy,tokens
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: httpx>=0.27
Requires-Dist: rich>=13.0
Requires-Dist: starlette>=0.38
Requires-Dist: uvicorn>=0.30
Description-Content-Type: text/markdown

# Brevia

**Save 55-93% on LLM tokens. Same quality, fraction of the cost.**

Brevia is a local proxy that sits between your tools and the Anthropic API. It algorithmically optimizes context before it reaches Claude — cutting tokens by 55-97% on large contexts while maintaining full recall and producing more precise answers.

Works with Claude Code, Cursor, Continue, aider, and any tool that uses the Anthropic SDK.

---

## Quick Start

```bash
pip install brevia
brevia login          # Opens browser — sign in with GitHub or Google
brevia serve          # Starts proxy on localhost:8420
```

Then add to your shell profile (`~/.zshrc`, `~/.bashrc`):

```bash
export ANTHROPIC_BASE_URL=http://localhost:8420
```

That's it. Everything works exactly as before — but cheaper and often smarter.

---

## How It Works

```
Your Tool (Claude Code, Cursor, etc.)
    │
    │  ANTHROPIC_BASE_URL=http://localhost:8420
    ▼
┌─────────────────────────────────┐
│         Brevia Proxy            │  ← Runs locally
│                                 │     Zero LLM cost
│  1. Analyze context structure   │     ~5ms latency
│  2. Score relevance by query    │
│  3. Extract key sections        │
│  4. Inject liberation prompt    │
└─────────────────────────────────┘
    │
    │  Optimized payload (55-97% smaller)
    ▼
┌─────────────────────────────────┐
│      api.anthropic.com          │  ← Your API key
│                                 │     Your account
│  Claude processes focused       │     You pay less
│  context = better answers       │
└─────────────────────────────────┘
    │
    │  Response streams back
    ▼
Your Tool (unchanged behavior)
```

**Key insight:** Less noise = better answers. When Claude sees 2.4k tokens of the RIGHT code instead of 95k tokens of everything, it produces more precise diagnoses.

---

## Benchmarks

Tested against Claude Opus 4.6/4.7 on real codebases (Django, FastAPI, psf/requests):

| Context Size | Cost Savings | Quality Impact |
|-------------|--------------|----------------|
| < 2k tokens | 0% (passthrough) | None |
| 10k tokens | ~55% | None |
| 50k tokens | 76-93% | None |
| 95k tokens | 76% | **Improved** (more precise) |

### Real Code Analysis (4-Path Comparison)

| Path | Total Cost | vs Direct |
|------|-----------|-----------|
| Direct Opus (full context) | $0.836 | baseline |
| Brevia + Opus | $0.278 | **67% cheaper** |

### Structured Data (50k token billing report)

| Metric | Value |
|--------|-------|
| Token reduction | 97.4% |
| Cost savings | 93.3% |
| Recall | 1.0/1.0 (perfect) |

Full benchmark methodology and raw data: [benchmarks/BENCHMARKS.md](benchmarks/BENCHMARKS.md)

---

## Commands

| Command | Description |
|---------|-------------|
| `brevia login` | Authenticate (opens browser) |
| `brevia serve` | Start the proxy |
| `brevia serve -p 9000` | Start on custom port |
| `brevia stats` | Show your savings stats |
| `brevia stats -d 30` | Show last 30 days |
| `brevia logout` | Remove credentials |

---

## What You'll See

When `brevia serve` is running:

```
╭─ 🏛️  Brevia ──────────────────────────────────╮
│ Brevia is running                              │
│                                                │
│   Proxy:    http://127.0.0.1:8420              │
│   User:     @yourname                          │
│   Status:   Optimizing all Anthropic API calls │
│                                                │
│   Set this in your shell:                      │
│   export ANTHROPIC_BASE_URL=http://127.0.0.1:8420 │
╰────────────────────────────────────────────────╯
```

Run `brevia stats` anytime:

```
╭─ 📊 Brevia Stats ─────────────────────────────╮
│ All-time savings                               │
│                                                │
│   Days active:    12                           │
│   Total requests: 847                          │
│   Tokens saved:   4,230,000                    │
│   Avg reduction:  71%                          │
│   Est. $ saved:   $63.45                       │
╰────────────────────────────────────────────────╯
```

---

## Where Brevia Helps Most

- **Large contexts (50k+ tokens):** 76-97% savings with equal or better quality
- **Noisy contexts:** Relevant info buried in boilerplate — Brevia extracts what matters
- **Multi-file contexts:** Only sends relevant files to Claude

## Where Brevia Does NOT Help

- **Tiny contexts (< 2k tokens):** Passed through unchanged (no overhead)
- **Already-focused queries:** If you're already sending only relevant code, nothing to cut
- **Full-file reasoning tasks:** Some tasks need the entire file flow

---

## Privacy & Security

- Your API key is **passed through** — Brevia never stores it
- Optimization happens **locally** — your code never leaves your machine
- Telemetry is **aggregated stats only**: token counts, savings, request count
- **No content** is ever sent to Brevia servers
- Credentials stored in `~/.brevia/` with restricted permissions

---

## Platform Support

- macOS (Intel + Apple Silicon)
- Linux (x86_64 + ARM64)
- Windows 10+

Requires Python 3.10+.

---

## Enterprise

Need team-wide deployment, custom optimization rules, or priority support?

**Contact us:** enterprise@brevia.dev

---

## How It's Different

| | Brevia | Prompt caching | Summarization |
|---|---|---|---|
| Approach | Algorithmic extraction | Cache repeated prefixes | LLM summarizes context |
| Cost | Zero (local CPU) | Reduced on cache hit | Adds an LLM call |
| Latency | ~5ms | None on hit | +1-3s per call |
| Quality | Equal or better | Same | Often degrades |
| Works with | Any Anthropic tool | SDK only | Custom code only |

Brevia stacks with prompt caching — use both for maximum savings.

---

## License

MIT

---

Built by engineers who got tired of paying for noise.
