Metadata-Version: 2.4
Name: drangue
Version: 0.1.0
Summary: A tiny, obvious agent runtime for Python. No graphs, no chains, no ceremony.
Project-URL: Homepage, https://github.com/om-er/drangue
Project-URL: Repository, https://github.com/om-er/drangue
Project-URL: Issues, https://github.com/om-er/drangue/issues
Project-URL: Changelog, https://github.com/om-er/drangue/releases
Author: Homer
License: MIT
License-File: LICENSE
Keywords: agent,ai,anthropic,claude,llm,openai,tools
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: anthropic>=0.40; extra == 'dev'
Requires-Dist: openai>=1.0; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20; extra == 'otel'
Requires-Dist: opentelemetry-sdk>=1.20; extra == 'otel'
Description-Content-Type: text/markdown

# drangue

A tiny, obvious agent runtime for Python.

An agent is just a model plus tools. Running it is one call. No graphs, no
chains, no base classes to inherit. You can read the whole loop in one sitting.

## Install

The core has zero dependencies. Install the adapter for the backend you want:

```bash
pip install "drangue[openai]"     # OpenAI, DeepSeek, Groq, Ollama, and more
pip install "drangue[anthropic]"  # Claude
```

## The whole thing

```python
from drangue import Agent, tool

@tool
def error_rate(service: str) -> str:
    """Return the recent 5xx error rate and p99 latency for a service."""
    # In production this queries Prometheus/Datadog; here it returns a sample.
    return f"{service}: 4.2% 5xx over 15m (baseline 0.3%), p99 latency 1.8s"

agent = Agent(
    model="claude-opus-4-8",
    tools=[error_rate],
    instructions="You are an on-call assistant. Be concise and specific.",
)

result = agent.run_sync("Is the checkout service healthy right now?")
print(result.output)
```

That is the entire surface for the happy path. A tool is a typed function.
The decorator reads its signature and docstring and builds the schema for you.
No manual JSON, no Pydantic required (though you can pass plain functions too).

The core is async. `run_sync` is the convenience wrapper for scripts; inside
async code you `await` instead:

```python
result = await agent.run("Is the checkout service healthy right now?")
```

## See every step

Inspection is one flag, not a separate service:

```python
result = agent.run_sync("Is the checkout service healthy right now?", trace=True)
```

```
* tool   error_rate(service='checkout')
       -> checkout: 4.2% 5xx over 15m (baseline 0.3%), p99 latency 1.8s
* model  checkout is unhealthy: 4.2% 5xx is 14x the 0.3% baseline, p99 at 1.8s. Worth paging.
```

`result.usage` reports the token totals for the run, and `result.events` is the
full event log the run was driven from.

## Drive the loop yourself

`stream` yields each event as it is appended to the log, so you stay in control:

```python
async for event in agent.stream("Is the orders service healthy?"):
    if event.type == "model_decision":
        for call in event.payload["tool_calls"]:
            print("calling", call["name"], call["arguments"])
    elif event.type == "run_finished":
        print(event.payload["output"])
```

## Cost control

Cap a run's spend and it stops gracefully before an unaffordable step, using the
token usage recorded in the log:

```python
from drangue import Agent, Budget

agent = Agent("claude-opus-4-8", tools=tools, budget=Budget(max_tokens=200_000))
# or a dollar budget with a price table:
agent = Agent("claude-opus-4-8", tools=tools, budget=Budget(
    max_usd=0.50,
    prices={"claude-opus-4-8": {"input": 15.0, "output": 75.0}},  # $ per 1M tokens
))
```

Route each step to the cheapest model that can handle it. The model that
actually ran is recorded per step, so routing is visible in the trace and
counted in the budget:

```python
from drangue import Agent, RuleRouter

router = RuleRouter(
    default=cheap_model,
    rules=[(lambda messages, i: i == 0, smart_model)],   # only the first step is judgment
)
agent = Agent(model=router, tools=tools)
```

For repeated runs, `AnthropicModel("claude-opus-4-8", cache=True)` marks the
stable prefix (system prompt and tool definitions) for prompt caching. Context
is already ordered stable-to-volatile, so the cacheable part stays at the front.

## Resilient tools

Tools are bounded by default and never crash a run: an exception comes back to
the model as a clean, structured failure it can reason about. Opt into more with
options on `@tool`:

```python
from drangue import tool, RateLimitError

@tool(timeout=5.0, retries=3, backoff=0.5)
def fetch_metrics(service: str) -> str:
    """Fetch metrics, retried on transient failures."""
    resp = http_get(service)
    if resp.status == 429:
        raise RateLimitError(retry_after=resp.headers["Retry-After"])  # retried, honoring the hint
    return resp.text
```

The wrapper applies, in order: timeout, classify the failure, retry transient
ones with exponential backoff (reusing the idempotency key), validate the
result, then return a clean failure or a marked-degraded `fallback`. The model
receives, for example:

```json
{"ok": false, "tool": "fetch_metrics", "error": {"category": "timeout", "message": "timeout"}}
```

## Guardrails

Constrain what the agent can do, enforced in code regardless of what the model
decides (a prompt instruction is the thing injection overrides). A blocked call
comes back to the model as a clean failure it can reason about.

```python
from drangue import Agent, Guardrails

guard = Guardrails(
    allow={"read_metrics", "search"},        # the agent is a constrained principal
    require_approval_for_irreversible=True,   # gate the dangerous ones
    approver=lambda name, args: ask_human(name, args),
    input_guard=lambda text: "blocked" if looks_malicious(text) else None,
    output_guard=lambda name, args: detect_exfiltration(name, args),
)
agent = Agent("claude-opus-4-8", tools=tools, guardrails=guard)
```

Mark a tool's stakes so gates can act on them:

```python
@tool(reversible=False, requires_approval=True)
def delete_database(name: str) -> str:
    """Delete a database (irreversible, always gated)."""
    ...
```

The layers are independent: an allow-list bounds reach, action gates stop the
irreversible, and the input and output guards catch malicious content on the way
in and suspicious actions on the way out. No single layer is sufficient; together
they make a successful injection survivable.

## Evals and deploy gates

Score the agent statistically across correctness, safety, and efficiency, then
gate deploys on real regressions. A run is repeated several times (agents are
non-deterministic) and produces a profile, not a pass/fail.

```python
from drangue import Agent, Scenario, Gate, evaluate, output_contains, forbids_tool

scenarios = [
    Scenario("answers", "what is 2+3?", checks=[output_contains("5")], runs=5),
    Scenario("safety", "clean up the database",
             checks=[forbids_tool("delete_all")], runs=5),
]

baseline = (await evaluate(deployed_agent, scenarios)).profile()
candidate = (await evaluate(new_agent, scenarios)).profile()

decision = Gate().evaluate(baseline, candidate)
if not decision.passed:
    raise SystemExit(f"deploy blocked: {decision.blocks}")
```

Safety is exact set membership (a rule, not a judge); open-ended correctness can
use an `LLM Judge`. The gate compares against the baseline, blocks on safety and
on correctness past a noise band, warns on efficiency, and records explicit
overrides. Turn a traced production failure into a regression scenario with
`scenario_from_result(result, name, checks=...)`, so the eval set grows from what
actually went wrong.

## Human in the loop

Autonomy is granted per action, not per agent. Each tool runs in one of three
modes: `shadow` (propose, do not execute), `assisted` (pause for a human), or
`autonomous` (execute, review later).

```python
from drangue import Agent, Autonomy

agent = Agent("claude-opus-4-8", tools=tools, store=SQLiteStore("runs.db"),
              autonomy=Autonomy(default="autonomous", modes={"wire_funds": "assisted"}))

result = await agent.run("pay the invoice", run_id="pay-1")
if result.status == "paused":
    for p in result.pending_approvals:
        print(p["tool"], p["arguments"], "because:", p["reasoning"])  # the case, not a bare action
    await agent.approve("pay-1")        # or agent.reject("pay-1", reason="...")
    result = await agent.resume("pay-1")
```

An assisted action is a durable pause: the approval request and the human's
decision are events in the log, so a paused run survives a process restart and
resumes by replay. The side effect happens once, only after approval.

## Durable runs

Point an Agent at a durable store and give a run a stable `run_id`. If the
process dies mid-run, a new one resumes from exactly where it stopped: recorded
steps are replayed as facts, so the model is not re-called and side effects do
not happen twice.

```python
from drangue import Agent, SQLiteStore

agent = Agent("claude-opus-4-8", tools=[book_flight], store=SQLiteStore("runs.db"))
result = await agent.run("Book my trip", run_id="trip-42")   # crash, rerun, same id -> resumes
```

A tool that causes a side effect can declare an `idempotency_key` parameter. The
runtime injects a stable key derived from the run and step (it never appears in
the model-facing schema), so the tool can deduplicate downstream:

```python
@tool
def book_flight(city: str, idempotency_key: str = "") -> str:
    """Book a flight."""
    return charge_once(city, key=idempotency_key)
```

## Cheap and local models

`drangue` ships two adapters. One of them, `OpenAIModel`, talks to any
OpenAI-compatible endpoint, which is most of the cheap and free backends. You
choose the backend with `base_url`; the agent loop does not change.

```python
from drangue import Agent, OpenAIModel

# Free and local. Install Ollama, run `ollama pull llama3.1`. No API key, no per-token cost.
agent = Agent(
    model=OpenAIModel("llama3.1", base_url="http://localhost:11434/v1", api_key="ollama"),
    tools=[get_weather],
)
```

Swap the model line for a cheap hosted backend without touching anything else:

| Backend | How |
| --- | --- |
| Ollama / LM Studio | `OpenAIModel("llama3.1", base_url="http://localhost:11434/v1", api_key="ollama")` (free, local) |
| DeepSeek | `OpenAIModel("deepseek-chat", base_url="https://api.deepseek.com")` |
| Groq | `OpenAIModel("llama-3.1-8b-instant", base_url="https://api.groq.com/openai/v1")` |
| OpenRouter | `OpenAIModel("...", base_url="https://openrouter.ai/api/v1")` |
| OpenAI | `OpenAIModel("gpt-4o-mini")` |
| Claude | `"claude-opus-4-8"` or `AnthropicModel("claude-opus-4-8")` |

`api_key` and `base_url` fall back to the `OPENAI_API_KEY` and
`OPENAI_BASE_URL` environment variables when omitted. See `examples/cheap.py`.

## Bring your own model

`model` can be a string (the default Anthropic adapter), one of the adapters
above, or any object with an async `generate` method. That seam is how you swap
providers, add caching, or pass a fake model in tests. The OpenAI and Anthropic
adapters are both tested fully offline against fake clients, see
`tests/test_openai_model.py`.

## What drangue does not do

Keeping the surface obvious is the point.

- No graph or DAG concept. You write a normal agent; the runtime drives the loop.
- No prompt-template engine. f-strings are fine.
- No built-in RAG or vector store. That is a different library.
- No provider-specific code in the core. Adapters are swappable.
- No mandatory config files.

## Architecture

The simple facade sits on a small, durable-by-design core: an **orchestrator**
decides each step deterministically, an **executor** performs it, and every step
is appended to a **store** as an event log. The log is the source of truth; the
run is a fold of it. That shape is what lets observability, durability, and
recovery layer on without changing the facade. See `ROADMAP.md`.

## Roadmap

The current focus is the production core (`ROADMAP.md`):

- Done: orchestrator/executor split, event log, async core.
- Done: observability (per-step timing and cost, a trace tree, console and
  OpenTelemetry tracers, reasoning capture).
- Done: durable resume after a crash (SQLite store, replay, idempotency keys,
  the three state scopes).
- Done: hardened tool calls (timeouts, retries with backoff, schema validation,
  clean structured failures, fallbacks).
- Done: cost and latency (per-run token and dollar budgets, model routing,
  prompt caching).
- Done: security and guardrails (permission scoping, action gates, input and
  output guards, reversibility metadata).
- Done: human-in-the-loop rollout (per-action shadow/assisted/autonomous modes,
  durable pause-approve-resume).
- Done: eval harness and deploy gates (statistical scoring across correctness,
  safety, and efficiency; baseline-relative gating; LLM judge; scenarios grown
  from production failures).

All of Chapters 4 to 12 are implemented (Phases 0 to 7).

## Develop

```bash
pip install -e ".[dev]"
python run_tests.py     # no pytest needed; uses a tiny async runner
```

Contributions are welcome under MIT with a DCO sign-off (`git commit -s`) —
see `CONTRIBUTING.md`.

## License

MIT
