Metadata-Version: 2.4
Name: tracer-llm
Version: 0.3.2
Summary: TRACER: Trace-Based Adaptive Cost-Efficient Routing. Turn LLM traces into parity-gated routing policies - cut 90%+ of LLM calls with formal guarantees.
Project-URL: Homepage, https://github.com/adrida/tracer
Project-URL: Repository, https://github.com/adrida/tracer
Project-URL: Documentation, https://github.com/adrida/tracer#readme
Project-URL: Bug Tracker, https://github.com/adrida/tracer/issues
License: MIT License
        
        Copyright (c) 2025 TRACER Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: conformal-prediction,cost-reduction,explainability,learn-to-defer,llm,machine-learning,routing,surrogate,xai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: joblib>=1.1
Requires-Dist: numpy<2.1,>=1.21
Requires-Dist: scikit-learn>=1.0
Provides-Extra: all
Requires-Dist: faiss-cpu>=1.7; extra == 'all'
Requires-Dist: kaleido>=0.2; extra == 'all'
Requires-Dist: matplotlib>=3.5; extra == 'all'
Requires-Dist: numpy<2.1,>=1.21; extra == 'all'
Requires-Dist: pandas>=1.4; extra == 'all'
Requires-Dist: plotly>=5.0; extra == 'all'
Requires-Dist: sentence-transformers<5,>=2.2; extra == 'all'
Requires-Dist: torch>=2.0; extra == 'all'
Requires-Dist: xgboost>=1.7; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: embeddings
Requires-Dist: numpy<2.1,>=1.21; extra == 'embeddings'
Requires-Dist: sentence-transformers<5,>=2.2; extra == 'embeddings'
Requires-Dist: torch>=2.0; extra == 'embeddings'
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.7; extra == 'faiss'
Provides-Extra: notebooks
Requires-Dist: datasets>=2.0; extra == 'notebooks'
Requires-Dist: faiss-cpu>=1.7; extra == 'notebooks'
Requires-Dist: huggingface-hub>=0.14; extra == 'notebooks'
Requires-Dist: matplotlib>=3.5; extra == 'notebooks'
Requires-Dist: numpy<2.1,>=1.21; extra == 'notebooks'
Requires-Dist: pandas>=1.4; extra == 'notebooks'
Requires-Dist: sentence-transformers<5,>=2.2; extra == 'notebooks'
Requires-Dist: torch>=2.0; extra == 'notebooks'
Provides-Extra: viz
Requires-Dist: kaleido>=0.2; extra == 'viz'
Requires-Dist: plotly>=5.0; extra == 'viz'
Provides-Extra: xgboost
Requires-Dist: xgboost>=1.7; extra == 'xgboost'
Description-Content-Type: text/markdown

# TRACER

**Trace-Based Adaptive Cost-Efficient Routing**

[![arXiv](https://img.shields.io/badge/arXiv-2604.14531-b31b1b.svg)](https://arxiv.org/abs/2604.14531)
[![Hugging Face](https://img.shields.io/badge/🤗%20HF-Papers-yellow)](https://huggingface.co/papers/2604.14531)
[![PyPI](https://img.shields.io/pypi/v/tracer-llm)](https://pypi.org/project/tracer-llm/)
[![Downloads](https://static.pepy.tech/badge/tracer-llm)](https://pepy.tech/project/tracer-llm)
[![Downloads](https://static.pepy.tech/badge/tracer-llm/month)](https://pepy.tech/project/tracer-llm)
[![Python](https://img.shields.io/pypi/pyversions/tracer-llm)](https://pypi.org/project/tracer-llm/)
[![npm](https://img.shields.io/npm/v/@tracer-llm/watch?label=%40tracer-llm%2Fwatch&color=cb3837&logo=npm)](https://www.npmjs.com/package/@tracer-llm/watch)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![CI](https://img.shields.io/badge/CI-passing-brightgreen)](https://github.com/adrida/tracer/actions)
[![Website](https://img.shields.io/badge/website-tracerml.ai-blue)](https://tracerml.ai)
[![Docs](https://img.shields.io/badge/docs-reference-blue)](docs/)

Most LLM-based classification pipelines use a large language model for every single input. In practice, the vast majority of that traffic is predictable - a lightweight traditional ML model (logistic regression, gradient-boosted trees, or a small neural net) can match the LLM's output with near-perfect agreement.

TRACER learns the decision boundary between "easy" and "hard" inputs directly from your LLM's own classification traces. It fits a fast, non-LLM surrogate on the easy partition, gates it with a calibrated acceptor, and defers only the uncertain inputs back to the LLM. Every deferred call produces a new trace, which feeds the next refit - coverage grows automatically over time. The result: **90%+ of classification calls routed to traditional ML, with formal parity guarantees against the teacher LLM and a self-improving routing policy**.

```bash
pip install tracer-llm
```

## See it work

```bash
tracer demo
```

```
  TRACER  Demo - Banking77 (77 intents · 1,500 traces)

  Routing Policy
  method      l2d
  coverage    91.4%   of traffic handled by surrogate
  teacher TA  0.920   surrogate matches teacher on handled traffic

  Cost Projection (10k queries/day)
      Without TRACER   10,000 LLM calls/day   $20.00/day
      With TRACER         863 LLM calls/day   $ 1.73/day   $6,670 saved/yr
```

## Quickstart

Input: a JSONL file where each line contains the original text (`input`) and the label your LLM assigned (`teacher`).

```python
import tracer

# 1. Fit - learn a routing policy from your LLM's classification traces
result = tracer.fit(
    "traces.jsonl",                  # {"input": "...", "teacher": "label"} per line
    embeddings=X,                    # np.ndarray (n, dim) - precomputed text embeddings
    config=tracer.FitConfig(target_teacher_agreement=0.95),
)

# 2. Route - surrogate handles easy inputs, LLM handles the rest
router = tracer.load_router(".tracer", embedder=embedder)
out = router.predict("What is my balance?")
# {"label": "check_balance", "decision": "handled", "accept_score": 0.96}

# 3. Fallback - only invokes the LLM when the surrogate declines
out = router.predict("Some edge case", fallback=lambda: call_my_llm(text))
```

> **Want to go deeper?** The [concepts guide](docs/concepts.md) explains the full pipeline, model zoo, and parity gate. The [API reference](docs/api.md) covers every parameter. The [CLI reference](docs/cli.md) covers `tracer fit`, `tracer serve`, and more. For observability see [watch](docs/watch.md), and to drive Tracer Cloud from your shell see the [`tracer cloud` reference](docs/cloud.md).

## Watch your LLM traffic (free observability)

Before you fit anything, just *watch*. Wrap any LLM call and every request is
recorded locally as an OpenTelemetry GenAI span, no account, no key, nothing
leaves your machine:

```python
import tracer

watch = tracer.watch("support_classifier", system="my-provider", model="my-model")

@watch
def classify(ticket: str) -> str:
    return call_my_llm(ticket)   # traces append to .tracer/watch/*.jsonl
```

Want them in a dashboard? Tracer Cloud observability is **free**. Mint a key
(`tracer cloud ingest-keys create`, or the Watch page in the app) and set one
env var, your traffic streams in within seconds, prod-safe (batched, never adds
latency or throws):

```bash
export TRACER_CLOUD_KEY=trobs_...   # or watch(..., cloud_key="trobs_...")
```

The same watched spans map 1:1 to `TraceRecord`, so once you have traffic you can
`tracer fit` a router from it. Full guide: [docs/watch.md](docs/watch.md).

## Using from JavaScript / Node.js

**Watch your JS LLM calls (free observability):** [`@tracer-llm/watch`](https://www.npmjs.com/package/@tracer-llm/watch) mirrors the Python decorator with zero dependencies, recording every call as an OpenTelemetry GenAI span (local by default, or streamed free to Tracer Cloud).

```bash
npm install @tracer-llm/watch
```

```js
import { watch } from "@tracer-llm/watch";

const w = watch("support_classifier", { system: "provider-x", model: "model-x" });

// Wrap the function that calls your model; the return value is auto-captured.
const classify = w(async (ticket) => callYourLLM(ticket));
```

Full guide: [docs/javascript.md](docs/javascript.md). To route (not just observe) from JS, log traces, fit offline with the CLI, run `tracer serve` as a sidecar, and call it via `fetch`:

```js
// 1. Log every LLM classification
fs.appendFileSync('traces.jsonl', JSON.stringify({ input: text, teacher: label }) + '\n')

// 2. At inference: embed → POST to TRACER → fallback to LLM only if deferred
const { label, decision } = await fetch('http://localhost:8000/predict', {
  method: 'POST',
  body: JSON.stringify({ embedding }),  // same model you used at fit time
}).then(r => r.json())

if (decision === 'deferred') label = await callYourLLM(text)
```

See the [JavaScript integration guide](docs/javascript.md) for the full setup including embeddings, docker-compose, batch prediction, and continual learning.

## How it works

```
User query → [Embedder] → [ML Surrogate] → [Acceptor Gate]
                                                |          |
                                            score >= t   score < t
                                                |          |
                                          Local answer   Defer to LLM
                                          (traditional ML)
```

The surrogate is **not another LLM** - it is a classical ML or shallow DL model. By default the zoo is lightweight and fast (logistic regression, SGD, and small feed-forward nets); the tree-based models (decision tree, random forest, extra-trees, gradient boosting) are heavier and opt-in with `tracer fit --trees`. This is what makes the cost reduction real: inference is CPU-bound, sub-millisecond, and free.

1. **Fit** - train a suite of candidate surrogates on your LLM's classification traces; select the best via cross-validated teacher agreement
2. **Gate** - attach a learned acceptor that estimates, per-input, whether the surrogate will agree with the teacher
3. **Calibrate** - sweep the acceptor threshold to maximise coverage at your target parity (e.g. ≥ 95% teacher agreement)
4. **Guard** - block deployment if the best candidate cannot clear the parity bar on held-out data

## Benchmark results (Banking77 - 77-class intent classification)

| Metric | Value |
|--------|-------|
| Coverage | **92.2%** of traffic handled locally |
| Teacher agreement (handled) | 96.1% |
| End-to-end accuracy | 96.4% |
| **Annual savings** (10k queries/day) | **$302,850** |

_Banking77 is a 77-class task; the tree models help here, so these numbers are with `tracer fit --trees`. The lightweight default (linear + MLP) is faster and enough for most tasks._

## Continual learning flywheel

TRACER is not a one-shot fit. Every deferred input that reaches the LLM produces a new labeled trace, which feeds back into the next refit. As the surrogate sees more of the input distribution, its coverage grows - meaning fewer LLM calls, which in turn cost less, while the quality guarantee holds at every iteration.

```
Day 1:  2,000 traces → 84% coverage → 1,600 calls/day saved
Day 3:  6,000 traces → 90% coverage → 9,000 calls/day saved
Day 5: 10,000 traces → 92% coverage → 9,200 calls/day saved
```

```python
tracer.update("new_traces.jsonl", embeddings=X_new)  # refit with new production traces
```

The parity gate re-calibrates on each update, so coverage only increases when the surrogate actually earns it.

## Embedder options

```python
from tracer import Embedder

embedder = Embedder.from_sentence_transformers("BAAI/bge-small-en-v1.5")  # local
embedder = Embedder.from_endpoint("https://api.example.com/embed", headers={...})  # API
embedder = Embedder.from_callable(my_fn)  # any function
# or skip the embedder and pass raw np.ndarray embeddings directly
```

Need to compute embeddings at fit time?

```bash
pip install tracer-llm[embeddings]   # adds sentence-transformers
```

```python
X = tracer.embed(texts)  # default: all-MiniLM-L6-v2 (384-dim)
```

## CLI

| Command | What it does |
|---------|-------------|
| `tracer demo` | Zero-setup demo on real data |
| `tracer scan traces.jsonl --html scan.html` | Day-one read: how much traffic is certifiably routable, with a 3D map |
| `tracer fit traces.jsonl --target 0.95` | Fit a routing policy |
| `tracer update new_traces.jsonl` | Refit with new traces |
| `tracer report-html` | Open the HTML report |
| `tracer serve .tracer --port 8000` | HTTP prediction server |
| `tracer cloud login` | Drive Tracer Cloud from the terminal: create, train, route, test, and watch tracers, at parity with the dashboard ([docs](docs/cloud.md)) |

`tracer scan` is the fast, conservative first look (similarity grouping plus exact held-out bounds, no training). It needs about 1,000 traces and works best around 5,000; below 1,000 it asks you to collect more, or pass `--force` for a best-effort floor. Embeddings are computed locally by default (sentence-transformers), or point it at your own embedding service with `--embed-url`. `tracer fit` then trains the real router and certifies more of the same traffic. The HTML report includes an interactive 3D map of your embedding space with a verdict/label colour toggle. See the [CLI reference](docs/cli.md) for every flag.

## What's in `.tracer/`

| File | Contents |
|------|----------|
| `manifest.json` | Method, coverage, teacher agreement, label space |
| `pipeline.joblib` | Surrogate + acceptor + calibrated thresholds |
| `frontier.json` | All candidates at each quality target |
| `qualitative_report.json` | Per-label slices, boundary pairs, examples |
| `report.html` | Visual HTML report |

## Install

```bash
pip install tracer-llm                # core (numpy + sklearn + joblib)
pip install tracer-llm[embeddings]    # + sentence-transformers
pip install tracer-llm[all]           # everything
```

## Docs

| | |
|---|---|
| [Concepts](docs/concepts.md) | Pipeline internals, model zoo, parity gate |
| [API reference](docs/api.md) | Every function, parameter, and return type |
| [CLI reference](docs/cli.md) | `tracer fit`, `tracer serve`, `tracer demo`, and more |
| [JavaScript / Node.js](docs/javascript.md) | Full integration guide for JS pipelines |
| [Artifacts](docs/artifacts.md) | `.tracer/` directory schema |
| [Troubleshooting](docs/troubleshooting.md) | `selected_method=null`, coverage drift, embedding-dim mismatch |
| [AGENTS.md](AGENTS.md) | Integration guide for AI coding assistants |

## Paper

**TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification**  
Adam Rida, arXiv 2026

[![arXiv](https://img.shields.io/badge/arXiv-2604.14531-b31b1b.svg)](https://arxiv.org/abs/2604.14531) [![Hugging Face](https://img.shields.io/badge/🤗%20HF-Papers-yellow)](https://huggingface.co/papers/2604.14531)

```bibtex
@article{rida2026tracer,
  title   = {TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification},
  author  = {Rida, Adam},
  journal = {arXiv preprint arXiv:2604.14531},
  year    = {2026}
}
```

## License

MIT
