Metadata-Version: 2.4
Name: gyre
Version: 0.1.0
Summary: Elegant Python API for Claude CLI agents
Project-URL: Homepage, https://gitlab.com/HugoFara/gyre
Project-URL: Repository, https://gitlab.com/HugoFara/gyre
Project-URL: Issues, https://gitlab.com/HugoFara/gyre/-/issues
Project-URL: Changelog, https://gitlab.com/HugoFara/gyre/-/blob/main/CHANGELOG.md
Author: Hugo Farajallah
License-Expression: MIT
License-File: LICENSE
Keywords: agent,ai,automation,claude,cli
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.2; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/logo.svg" alt="gyre" width="140">
</p>

# Gyre

A Python orchestration layer over the Claude Code CLI. Gyre drives
`claude -p` as a subprocess and gives you four primitives the official
Agent SDK doesn't: `check()` as control flow, count-constrained typed
list extraction, scoped `Memory` handles, and parallelism-capped loops
with a dumpable call log.

## What it is

Gyre is not an Anthropic API wrapper. It shells out to your locally
installed `claude` CLI, inheriting the CLI's auth, its built-in tools
(Read, Edit, Bash, Grep, …), MCP servers, and plan mode — without
reimplementing any of them.

What Gyre adds is the *loop*:

- **`check()` as control flow** — a yes/no Claude call that reads as
  ordinary Python (`while not await agent.check(...)`).
- **Typed extraction with count constraint** — `agent.get(T, ...)`
  and `agent.get_list(T, ..., n=N)` return Python values; `n=`
  injects `minItems`/`maxItems` into the JSON schema.
- **Memories as named handles** — `memorize()` returns a `Memory`
  object you pass into specific calls, scoped per call site rather
  than accumulating in a session buffer.
- **Budget, iteration, and parallelism caps** —
  `agent_context(budget_usd=..., max_iterations=..., max_parallel=...)`
  enforced across every call in the loop. The call that crosses raises
  *after* incurring its cost; that call still lands in the log.
- **A dumpable, replayable call log** — every call records a
  `CallRecord`; `agent.dump_log(path)` writes JSON for inspection or
  eval. Pass that path back via `agent_context(replay_from=...)` and
  the next run is served from the log without invoking the CLI.

If you're calling the Anthropic API directly with an API key,
Anthropic's official `claude-agent-sdk` is the right tool. If you're
driving the `claude` CLI and want loops that read as Python, this is
the library for that.

## Installation

```bash
pip install gyre
```

Requires the `claude` CLI on PATH and Python ≥ 3.11.

Gyre calls `claude -p` directly. If `claude --version` works in your
shell, gyre will work too. If not, install Claude Code first and run
`claude` once interactively to authenticate.

## Quick start

The smallest runnable example — save as a `.py` file, run with
`python`:

```python
import asyncio
from gyre import check, get


async def main() -> None:
    if await check("Is Python 3.11 or newer installed?"):
        version = await get[str]("What's the running Python version?")
        print(version)


asyncio.run(main())
```

Once that runs, the headline example below shows the loop shape this
library is actually built for.

## Headline example

A stewarded loop: generate candidates, score them, iterate until
"good enough", under a budget.

```python
import asyncio
from pydantic import BaseModel
from gyre import Tool, agent_context, memorize


class Draft(BaseModel):
    text: str


class Score(BaseModel):
    value: int
    reason: str


async def main() -> None:
    voice = memorize(
        "Brand voice: terse, no marketing fluff, no exclamation marks.",
        label="voice",
    )

    async with agent_context(
        tools=[Tool.Read],
        memories=[voice],
        max_iterations=20,
        budget_usd=2.00,
        max_parallel=4,
        model="sonnet",
    ) as agent:
        candidates = await agent.get_list(
            Draft, "Generate distinct intros for: kettle launch", n=8
        )

        while not await agent.check("Is the best candidate good enough to ship?"):
            scores = await asyncio.gather(*[
                agent.get(Score, f"Score this draft 1–10: {c.text}")
                for c in candidates
            ])
            survivors = [c for c, s in zip(candidates, scores) if s.value >= 7]
            if not survivors:
                break
            candidates = await agent.get_list(
                Draft,
                f"Produce 8 variations of these survivors: {[s.text for s in survivors]}",
                n=8,
            )

        winner = await agent.get(Draft, "Return the best candidate.")
        print(winner.text)
        print(agent.log_summary())
        agent.dump_log("run.json")


asyncio.run(main())
```

What the `agent_context` is doing for you:

- Every call inside flows through one budget and one iteration counter.
  When `cost_usd > 2.00` the next call raises `BudgetExceededError`
  *after* incurring its cost — the cost is recorded, not pre-paid.
- `max_parallel=4` is a real `asyncio.Semaphore` around `run_claude`,
  so the `gather(...)` fanout never has more than 4 in flight.
- `n=8` injects `minItems`/`maxItems` into the JSON schema and tells
  the model the target count in the prompt.
- `voice` is read fresh from disk on every call and injected as
  `<context label="voice">…</context>` — edit the file and the next
  call picks it up.
- Every call lands in `agent.log()` with kind, prompt, schema, tools,
  cost, and a truncated result; `dump_log` serializes it to JSON.

`budget_usd=2.00` here is intentionally tight — set so the budget
engages as a stop signal, not a worst-case bound. Raise it for longer
sweeps; the iteration count it buys depends on prompt size and model.

## Pydantic models

```python
from pydantic import BaseModel
from gyre import get_list


class Task(BaseModel):
    title: str
    priority: int
    done: bool


tasks = await get_list[Task]("Extract all tasks from this document")
```

The schema is generated from the model; the response is validated
against it before the value comes back.

## Memories

`memorize()` writes a string to disk and returns a `Memory` handle. By
default it writes to a temp file, which is fine for ad-hoc memories
that live with the process. For memories you want under version
control, pass `path=`:

```python
from gyre import memorize

# Default — temp file, gone with the process.
voice = memorize("Brand voice: terse.", label="voice")
# Memory(label='voice', path=PosixPath('/tmp/gyre_memory_…'), created_at=…)

# Versioned — exact path under your repo.
voice = memorize(
    "Brand voice: terse.",
    label="voice",
    path="memories/voice.md",
)
```

Pass the handle via `memories=[...]` to any call (top-level, or via
`agent_context(memories=...)`, or `Agent.with_memories(...)`):

```python
async with agent_context(memories=[voice]) as agent:
    await agent.check("On brand?")
```

Semantics, precisely:

- The file is **re-read on every call** that uses the handle. Edit the
  file, the next call sees the new content.
- The content is wrapped in `<context label="LABEL">…</context>` and
  appended to the user prompt.
- `path=` writes to exactly that path (creating parent directories);
  `directory=` writes a gyre-named file in that directory; passing
  both is a `ValueError`. Without either, the file lands in the system
  temp directory.
- A missing file at handle construction is `FileNotFoundError`; a file
  removed *after* construction raises on the next read (the failure is
  surfaced, not swallowed).
- Memories are not deduplicated. Pass them once.
- Memories do not persist across `agent_context` blocks — a new
  context starts with no implicit memories.

## `do()` requires a tool scope

Because `do()` can mutate the filesystem or shell state, it refuses to
run unless the tool list is locally readable:

```python
# raises MissingToolScopeError — tools not visible at the call site
await do("Edit src/foo.py")

# OK — explicit at the call site
await do("Edit src/foo.py", tools=[Tool.Read, Tool.Edit])

# OK — explicit at the surrounding scope
async with agent_context(tools=[Tool.Read, Tool.Edit]) as agent:
    await agent.do("Edit src/foo.py")

# OK — explicit on the Agent
agent = Agent().with_tools(Tool.Bash)
await agent.do("ls")
```

`check()`, `get()`, `get_list()` don't carry the same rule — they
return values, not actions.

For a side-effect-free preview, set `dry_run=True`. This forces
`PermissionMode.PLAN` (overriding any `permission_mode=` you passed)
and rejects any tool from `MUTATING_TOOLS` (`Bash`, `Edit`, `Write`,
`MultiEdit`, `NotebookEdit`) at config time:

```python
async with agent_context(tools=[Tool.Read, Tool.Grep], dry_run=True) as agent:
    plan = await agent.get(str, "What changes would you make to fix the test?")
```

Custom or MCP tools (raw strings, not `Tool` members) pass through
`dry_run` unclassified — gyre can't tell whether they mutate.

## Budget, iterations, parallelism

`agent_context` accepts three independent caps:

- `budget_usd`: cumulative USD across the session.
- `max_iterations`: number of successful calls.
- `max_parallel`: in-flight call count.

Limits are checked **after** each successful call (post-pay, not
pre-pay). The call that crosses raises:

- `BudgetExceededError(cost_usd, budget_usd)` — `cost_usd` is the
  actual cumulative cost including the failing call.
- `IterationLimitError(iterations, max_iterations)`.

Both errors leave the session log fully populated, including the
offending call. So `agent.log()` after the raise tells you exactly
what was attempted.

Costs come from the `total_cost_usd` field of the `claude` CLI's JSON
output, accumulated per call — gyre doesn't estimate from token
counts. The numbers are as accurate as the CLI's own cost reporting.

```python
async with agent_context(budget_usd=2.00) as agent:
    try:
        while not await agent.check("done?"):
            await agent.get(Step, "next step")
    except BudgetExceededError as e:
        print(f"hit ${e.cost_usd:.4f} of ${e.budget_usd:.4f}")
        print(agent.log_summary())
```

`max_parallel` is implemented as an `asyncio.Semaphore` around each
subprocess launch, so `asyncio.gather(...)` over agent calls naturally
respects the cap.

## Session log

Every call inside `agent_context` produces one `CallRecord`:

```python
@dataclass(frozen=True)
class CallRecord:
    kind: str            # "check" | "do" | "get" | "get_list" | "raw"
    prompt: str
    type_name: str | None
    schema: dict | None
    tools: tuple[str, ...] | None
    memory_labels: tuple[str, ...]
    session_id: str
    cost_usd: float
    duration_ms: int
    timestamp: datetime
    result: str          # full CLI result, untruncated — for replay/eval
    result_summary: str  # truncated to ~500 chars, for `log_summary()`
```

Inspection:

```python
agent.log()          # tuple[CallRecord, ...]
agent.log_summary()  # short text report
agent.dump_log("run.json")
```

`result` is the full untruncated CLI output, suitable for offline eval
or as the input to a future replay primitive. `result_summary` is the
500-char display version used by `log_summary()`. `dump_log` writes
both, so JSON files can be large for verbose runs — post-process if
that matters for your storage.

Memory contents are not inlined — only labels and sha256 content
hashes — so logs stay tractable even with large context files. The
hashes let replay distinguish two records with the same memory labels
but different content.

## Replay

Pass a log path back via `replay_from=` and the session serves calls
from the log instead of invoking the CLI:

```python
from gyre import agent_context

async with agent_context(replay_from="run.json") as agent:
    # Same code path as the original run, but every call is matched
    # against the log by content hash and returned without hitting
    # Claude. cost_usd, iterations, and the new log mirror the original.
    ...
```

Matching is by sha256 of the full call signature: `kind`, `prompt`,
`schema`, sorted tools, ordered `(memory_label, memory_content_hash)`
pairs, `system_prompt`, `append_prompt`, `model`, `permission_mode`,
and `working_dir`. Anything that affected the original output is in
the key; tool order is normalized (it doesn't affect output) and
memory order is preserved (it does).

Identical-input calls bucket FIFO: a `gather(...)` fanout of N
identical prompts in the original run replays as N matches in
insertion order, and an (N+1)-th call against an N-record log is a
miss.

On a miss, the default is to raise `ReplayCacheMissError` with a
short summary of the unmatched call. Pass
`replay_on_miss="passthrough"` to fall through to a live call
instead — useful when you're extending a run with one more step:

```python
async with agent_context(
    replay_from="run.json", replay_on_miss="passthrough"
) as agent:
    # Existing calls hit cache; one new call goes live.
    ...
```

`load_log(path) -> tuple[CallRecord, ...]` is the lower-level
primitive — useful when you want to read records yourself rather than
replay. Old logs (pre-replay) load with default values for the new
fields and only match calls with those same defaults.

Cost, iteration, and budget tracking apply during replay using the
*original* costs, so a replayed run hits the same `BudgetExceededError`
at the same point as the original. `max_parallel` is not enforced
during replay (the semaphore is a real-call concept).

## Top-level vs scoped

There are two ways to call gyre:

- **`agent_context` / `Agent`** — the primary surface. Carries tools,
  memories, budget, parallelism, and the call log.
- **Top-level `check`/`do`/`get`/`get_list`** — convenience for
  one-shot calls outside a loop. Inside an `agent_context`, they
  automatically inherit the session via a contextvar, so a
  `gather(...)` fanout of top-level calls is also budgeted. Outside
  any context, they run unbounded.

```python
from gyre import Tool, check, do, get, get_list

if await check("Is this Python file syntactically valid?"):
    await do("Format with black", tools=[Tool.Bash])

count = await get[int]("How many functions are in main.py?")
todos = await get_list[str]("List all TODO comments")
```

The `get[int]("...")` subscript form is the one-liner ergonomic; on
an `Agent` use the direct form: `await agent.get(int, "...")`.

Reach for `agent_context` when you have more than one call. Reach for
the top-level form when you have exactly one.

## How Gyre compares

If you've used Anthropic's official **`claude-agent-sdk`**, the
relationship is easy to describe. The Agent SDK is feature-rich and
calls the Anthropic API via a bundled native binary; Gyre shells out
to your installed `claude` CLI. The SDK already gives you typed
structured output, `max_budget_usd`, `max_turns`,
`permission_mode="plan"`, `allowed_tools`, and full MCP support — so
for most agentic work driven by an API key, it is what you want.

Gyre exists because four things are absent or awkward in the SDK:

- **`check()` as a boolean primitive.** The SDK has no
  `await check("?") -> bool`. You'd define a Pydantic model with a
  single `bool` field and unwrap. Gyre makes the boolean call the
  unit so loops read as `while not await agent.check(...)`.
- **`get_list(..., n=N)` count constraint.** The SDK supports list
  extraction but not count constraints; you'd add `minItems`/`maxItems`
  to the schema yourself. Gyre injects them and tells the model the
  target count in the prompt.
- **Scoped memory handles.** The SDK persists full conversation
  history to disk; it has no per-call labeled context blocks. Gyre's
  `Memory` handles are explicitly scoped — read fresh on each call,
  attached to the calls you choose, never the whole session buffer.
- **`max_parallel` cap.** The SDK does not expose a concurrency cap.
  Gyre wraps each subprocess launch in an `asyncio.Semaphore` so
  `gather(...)` fanout naturally respects the limit.
- **Hash-cache replay.** The SDK persists JSONL session histories but
  has no replay primitive. Gyre's `agent_context(replay_from=path)`
  matches calls against a dumped log by content hash of the full
  signature; misses raise by default, and identical-input duplicates
  bucket FIFO so parallel fanouts replay correctly.

There's also the architectural split. Gyre drives the CLI you've
already authenticated, so it inherits whatever auth your `claude`
install uses. The Agent SDK runs its own bundled binary and reads
`ANTHROPIC_API_KEY`. Pick on auth and primitive needs, not feature
count.

For pure structured-output libraries — **Instructor**, **BAML** —
Gyre is in a different category. Those validate single-call typed
responses; Gyre is built around the loop those calls live in.

## Errors

Gyre raises a small hierarchy under `GyreError`:

- `AgentTimeoutError`, `AgentExecutionError` — subprocess-level.
- `TypeExtractionError`, `SchemaValidationError` — response parsing.
- `BudgetExceededError`, `IterationLimitError` — session limits.
- `MissingToolScopeError` — `do()` called without a visible tool list.
