Metadata-Version: 2.4
Name: agentcap-guard
Version: 0.1.0
Summary: Provider-agnostic hard dollar budgets + loop detection for AI agent runs. Stops the spend before it happens.
Author: wangkevin2100-cell
License: MIT
Project-URL: Homepage, https://github.com/wangkevin2100-cell/agentcap
Project-URL: Repository, https://github.com/wangkevin2100-cell/agentcap
Project-URL: Issues, https://github.com/wangkevin2100-cell/agentcap/issues
Keywords: llm,ai-agents,budget,cost-control,openai,guardrails,loop-detection
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: extras
Requires-Dist: tiktoken>=0.7; extra == "extras"
Requires-Dist: rich>=13.0; extra == "extras"
Provides-Extra: dev
Requires-Dist: agentcap[extras]; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Dynamic: license-file

# agentcap

> Hard dollar budgets + zombie-loop detection for AI agent runs. A
> provider-agnostic SDK that stops the spend **before** it happens — not a
> dashboard that shows you the bill after a flaky tool burned $400 overnight.

`agentcap` wraps your OpenAI-compatible client in one line. Before every model
call it estimates the cost and refuses to exceed a hard cap, and it hashes
repeated call/tool cycles to catch agents stuck in zombie retry loops — tripping
a circuit breaker that kills the run with a typed exception. Local-first, no
account, bring your own keys.

## Why

Observability tools (LangSmith, Langfuse, Helicone) tell you a run cost $380
*after* it happened. Framework limits (`max_iterations`) count steps, not
dollars, and evaporate the moment you switch frameworks. `agentcap` is the
missing piece: an in-process, **pre-call** brake that's provider-agnostic and
dead simple to adopt.

## Install

```bash
pip install agentcap-guard              # from PyPI (import name stays `agentcap`)
```

Or from source:

```bash
pip install -e .            # core, zero required deps
pip install -e ".[extras]" # + tiktoken (accurate token counts) + rich
```

> Note: the PyPI distribution name is **`agentcap-guard`** (the bare `agentcap`
> name was already taken), but the import name and CLI command are still
> `agentcap` — `pip install agentcap-guard` then `import agentcap`.

Python 3.11+.

## Quick start

Wrap any OpenAI-compatible client:

```python
from openai import OpenAI
from agentcap import guard

client = guard(OpenAI(), budget=5.00)   # hard $5 cap for this run

# use it exactly like the normal client — reads pass through untouched
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "hello"}],
)
# a call that would push spend over $5 raises BudgetExceeded *before* sending.
# the same model/messages repeated past the loop threshold raises LoopDetected.
```

Works with **OpenAI, Azure OpenAI, DeepSeek, Alibaba Qwen (DashScope), and any
OpenAI-compatible endpoint** — agentcap only needs
`client.chat.completions.create`.

Or wrap a whole run (for frameworks / custom loops that aren't a plain client):

```python
import agentcap

with agentcap.budget(2.50) as cap:
    while not done:
        cap.check(model="claude-opus-4-1", prompt_text=prompt,   # pre-call brake
                  step_signature={"model": m, "messages": msgs}) # loop guard
        usage = call_model(prompt)
        cap.record_usage("claude-opus-4-1",
                         usage.prompt_tokens, usage.completion_tokens)
```

## See it stop a runaway agent (no API key needed)

```bash
python examples/demo_budget_stop.py
```

A flaky-tool agent that would loop forever on expensive calls, stopped hard at a
$0.50 cap. Then inspect what happened:

```bash
agentcap runs                # recent runs: spend, budget, outcome
agentcap show <run_id>       # full event timeline, with the call that tripped it
```

Run it against your real Qwen/OpenAI key:

```bash
set DASHSCOPE_API_KEY=sk-...     # your Qwen key
python examples/demo_real_qwen.py
```

## Use it in an autonomous agent loop

The whole point: when you let an agent run itself in a `while` loop, a hard
budget + loop guard makes "burned $400 overnight" and "stuck retrying the same
tool forever" *structurally impossible* — one `cap.check(...)` per step:

```python
import agentcap

with agentcap.budget(5.00) as cap:          # hard $5 ceiling for this run
    while not done:
        cap.check(model="gpt-4o",            # pre-call brake: raises BudgetExceeded
                  prompt_text=prompt,        #   before sending if it would exceed,
                  step_signature=step)       #   and LoopDetected on a zombie loop
        resp = call_model(prompt)
        cap.record_usage("gpt-4o",
                         resp.usage.prompt_tokens,
                         resp.usage.completion_tokens)
```

See the full, runnable demo (no API key needed) showing both a budget stop and a
zombie-loop stop:

```bash
python examples/demo_agent_loop.py
```

## CLI

| Command | What it does |
|---|---|
| `agentcap runs` | list recent runs with cost, budget, outcome |
| `agentcap show <run_id>` | full event timeline for one run |
| `agentcap pricing` | show the active pricing table |
| `agentcap doctor` | sanity-check: tokenizer, pricing coverage for used models |

## How it works

- **Pre-call cost estimation.** Prompt tokens (via `tiktoken` if installed, else
  a conservative char-based estimate) + a conservative max-output assumption ×
  a per-model price (`pricing.toml`). If `spent + estimated > budget`, the call
  is **aborted before sending**.
- **Reconciliation.** After each call, real token usage from the response
  updates the running total — so the budget is enforced against truth, not just
  estimates.
- **Loop detection.** Each step's `(model, messages)` / `(tool, args)` signature
  is hashed; if the same hash repeats past the threshold within a rolling
  window, that's a zombie loop. An allowlist exempts legitimately-repeating
  calls (polling, pagination).
- **Circuit breaker.** On budget-exceed or loop-detected, the breaker latches:
  a typed exception (`BudgetExceeded` / `LoopDetected`) is raised and all further
  calls through the wrapped client re-raise it. One hard stop.
- **Audit log.** Append-only JSONL (one line per event) is the source of truth;
  a SQLite roll-up powers `agentcap runs`. State lives in `.agentcap/` (override
  with `AGENTCAP_HOME`).

## Pricing table

`pricing.toml` is **data, not code** — edit it, or point agentcap at your own
copy with `AGENTCAP_PRICING=/path/to/pricing.toml`. Unknown models fall back to
a deliberately high default so costs over-estimate rather than silently
under-count. `agentcap doctor` warns when a run used a model missing from the
table.

## Status

MVP. Core (budget + loops + breaker + audit + CLI) is implemented and tested
offline. Not yet built: the HTTP sidecar proxy for non-Python stacks, the Anthropic
native wrapper, and the hosted Team tier (shared budgets + Slack alerts). See
`../pain-radar/specs/02-agent-budget-guardrails/spec.md` for the full plan.

## License

MIT.
