Metadata-Version: 2.4
Name: noburn
Version: 0.1.0
Summary: Hard LLM spend enforcement for Python — pre-call budget checks and per-user metering.
Project-URL: Homepage, https://noburn.dev
Project-URL: Documentation, https://noburn.dev/docs
Project-URL: Repository, https://github.com/orvi2014/noburn
License: MIT
Keywords: ai,anthropic,budget,cost-control,langchain,llm,openai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# noburn

Hard LLM spend enforcement for Python — pre-call budget checks and per-user metering.

Check a planned LLM call against project-, per-user-, and per-run budget caps
*before* you make it, then record what it actually cost.

## Install

```bash
pip install noburn
```

Requires Python ≥ 3.9. No third-party dependencies. Ships type hints (`py.typed`).

## Quick start

```python
import os
from noburn import NoburnGuard

guard = NoburnGuard(
    api_key=os.environ["NOBURN_API_KEY"],
    project_id=os.environ["NOBURN_PROJECT_ID"],
    budget_cap_usd=10.0,     # optional local fallback cap
    on_error="allow",        # "allow" (default, fail-open) | "block" (fail-closed)
)

decision = guard.check(
    model="gpt-4o",
    estimated_tokens_in=1000,
    estimated_tokens_out=300,
    end_user_id="user_123",  # optional — enforces that user's cap
)

if decision.blocked:
    raise RuntimeError(f"Blocked by noburn: {decision.block_reason}")

# ... make your LLM call ...

guard.record(
    model="gpt-4o",
    tokens_in=980,
    tokens_out=290,
    cost_usd=0.005,
    was_blocked=False,
    end_user_id="user_123",
)
```

`record()` is fire-and-forget — it returns immediately (delivery happens on a
background daemon thread) and never raises, so it adds no latency to your call
path.

## Agent runs

Bound a single agent invocation with its own cap:

```python
run = guard.start_run(budget_cap_usd=0.5)

decision = guard.check(
    model="gpt-4o",
    estimated_tokens_in=1000,
    estimated_tokens_out=300,
    run_id=run.run_id,
)

guard.record(model="gpt-4o", tokens_in=980, tokens_out=290,
             cost_usd=0.005, was_blocked=False, run_id=run.run_id)

summary = guard.end_run(run.run_id)  # EndRunResult(status=..., spend_usd=..., ...)
```

## Behavior

- **`check()` never raises** except `NoburnAuthError` (bad key). Every call asks
  the server for an authoritative decision; if the server is unreachable it
  falls back to a local in-memory check.
- **`on_error`** controls that fallback when nothing local blocks the call:
  `"allow"` (default) fails open so a noburn outage never breaks your app;
  `"block"` fails closed (`block_reason="noburn_unreachable"`).
- **`start_run()` raises** on network/auth error (`NoburnTimeoutError` on
  timeout) — a run must be registered before you proceed.

> Note: `check()`/`start_run()` use blocking HTTP. In async apps, run them in a
> thread executor (e.g. `await asyncio.to_thread(guard.check, ...)`).

## License

MIT
