Metadata-Version: 2.4
Name: ai-producer
Version: 0.3.3
Summary: Reusable agentic-orchestration library: SQLite-WAL ledger + Claude Agent SDK dispatcher with per-agent budgets and kill switch.
Author: capt-pancakes
License: MIT License
        
        Copyright (c) 2026 capt-pancakes
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: claude-agent-sdk>=0.1.0
Requires-Dist: flask>=3.0
Requires-Dist: httpx>=0.27
Requires-Dist: python-dotenv>=1.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

# ai-producer

A reusable agentic-orchestration library: a long-running **Producer** process that polls a SQLite-WAL ledger, dispatches tasks to specialist agents (defined as system-prompt markdown files) via the [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-python), enforces per-agent daily budget caps, watches a kill switch, retries failures with audit-trail row-hash chaining, and pings a human (file marker + optional ntfy.sh push) when the queue stalls. Extracted from [Bastzee](https://github.com/capt-pancakes/Bastzee) — a fully agentic mobile game build — and parametrized for any project.

## Why this exists

When you have multiple specialist agents (Code, Design, QA, Art, Audio, Narrative, ...) cooperating without a human in the middle, you need a **bus**, a **budget**, and a **brake**.

- **Bus:** the ledger. Eight agents racing on the same SQLite table is fine if you have WAL + `BEGIN IMMEDIATE` + an optimistic-lock claim. Agents file their own downstream tasks via `LedgerClient.insert_task`, so the queue self-replenishes.
- **Budget:** per-agent daily $ caps + a project ceiling. Pre-dispatch reservation refuses the call before spending. Post-call accounting raises if the cap was breached after the fact.
- **Brake:** a kill-switch file (`agent-state/kill_switch`). Touch it to halt everything; remove it to resume. The Producer auto-arms it after a burst of failures (5 in 10 min) for one agent.

Everything else (escalations, morning reports, idle detection, audit chain) is plumbing in service of those three.

## Install

```bash
pip install ai-producer
```

Requires Python 3.11+. The Claude Agent SDK pulls in its own `claude` CLI subprocess; ensure `claude` is on your `PATH` for live runs.

## Quickstart

```bash
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
producer init                  # interactive: collects project name + ntfy topic,
                               # then hands off to a Claude Code session that
                               # walks you through designing your specialist agents
producer run --once            # drain the queue once
producer run                   # long-lived loop (Ctrl-C to stop)
```

`producer init` is interactive by default on a TTY. It prompts for:
- **Project name** (default: directory basename)
- **ntfy topic** (default: random suffix; can be empty to disable notifications)
- **Subject prefix** (default: project name)

After scaffolding, `producer init` hands off to `claude` CLI with `ONBOARDING.md` — a brief that walks you through writing your specialist agent prompts. If `claude` is not on your `PATH`, you'll see a friendly skip message and can re-run `claude` later (it'll pick up ONBOARDING.md automatically).

`producer init` flags:

- `--dir <path>` — target directory (default: current)
- `--force` — overwrite existing files
- `--non-interactive` — skip the wizard; derive defaults from `--dir`
- `--no-chat` — finish scaffolding only; skip the bootstrap-chat handoff

After `producer init` exits, your project has:

- `producer.yaml` — config (role list, budgets, ntfy topic)
- `agents/<role>/system_prompt.md` — per-role prompt skeletons (filled in via the bootstrap chat)
- `agent-state/ledger.db` — initialized SQLite ledger
- `seed_tasks.example.json` — example seed-tasks file
- `ONBOARDING.md` — the bootstrap-chat brief (re-run `claude` here any time to re-engage)

Then seed your queue and run:

```bash
producer task add --agent code --title "First task" --description "..."
producer run                   # long-lived loop (Ctrl-C to stop)
```

Export `ANTHROPIC_API_KEY` (or add to `.env`) before running.

## Mental model

```
                         ┌──────────────────┐
                         │  producer.yaml   │
                         │  roles + caps +  │
                         │  toolsets        │
                         └────────┬─────────┘
                                  │ loaded once
                                  ▼
   ┌─────────────────┐   poll    ┌──────────────────┐    Claude
   │  SQLite-WAL     │◄──────────│   Producer       │───►Agent SDK
   │  ledger         │   claim   │   process        │    (per-role
   │                 │   write   │                  │     prompt +
   │  tasks          │           │  • dispatch      │     toolset)
   │  decisions      │           │  • retry 3×      │
   │  asset_registry │           │  • escalate      │
   │  budget         │           │  • idle notify   │
   │  audit_log      │           │  • morning report│
   └────────▲────────┘           └──────────┬───────┘
            │                               │
            │ insert_task (the keystone)    │ ntfy.sh +
            │                               │ agent-state/needs-attention.md
            └──── specialist agents ────────┘
                  (one per role, dispatched
                   per-task by Producer)
```

The keystone is **`LedgerClient.insert_task`**. Specialist agents file their own downstream work — Designer files `code` + `art` tasks for the spec it just shipped, Code files a `qa` task for the feature it just implemented, etc. The queue self-replenishes; the Producer never goes idle as long as the agents honor their downstream filing.

## Configuration reference

`producer.yaml` (full schema):

```yaml
project_name: my-project       # used in dispatch prompts + report headers
project_root: .                # paths below resolve relative to this

# Runtime paths.
kill_switch: agent-state/kill_switch
ledger_db: agent-state/ledger.db
escalations_dir: agent-state/escalations
reports_dir: agent-state

# Loop cadence.
poll_interval_seconds: 10
idle_threshold_polls: 12       # 12 × 10s = 2 min idle before notifying

notifications:
  ntfy_topic: ""               # "" disables ntfy (file marker still written)
  subject_prefix: MyProject

budget:
  project_ceiling_usd: 200     # hard ceiling on total project spend per day
  soft_cap_ratio: 0.8          # per-agent soft warning threshold
  agents:
    producer:
      daily_cap_usd: 5
      dispatch_reservation_usd: 0.05    # pre-flight reservation per dispatch
      allowed_tools: [Read, Glob, Grep, Bash]
      prompt: agents/producer/system_prompt.md   # path under project_root
    code:
      daily_cap_usd: 20
      dispatch_reservation_usd: 0.20
      allowed_tools: [Read, Edit, Write, Glob, Grep, Bash]
      prompt: agents/code/system_prompt.md
    # ...add as many roles as you need
```

Any role can be added; the only special role name is `producer` (used for the dispatch reservation). If you don't declare a `producer` role, the dispatcher logs a warning and uses a $0.05 fallback reservation per call.

## Operating

### Kill switch

```bash
producer kill "deploy in progress, freeze all agents"
producer resume
```

The Producer halts dispatch on the next poll (in-flight invocations finish their current call). It auto-arms after **5 failures by one agent within 10 minutes** as a safety net.

### Escalations

Beyond 3 retries, a task is moved to `escalated` and a markdown file is written to `agent-state/escalations/<task_id>.md` with the description, payload, and recommended action. Resolve by editing the task in the ledger (via `producer task` subcommands, or directly with sqlite3) and setting it back to `pending` / `killed` / re-routed.

### Morning report

```bash
producer report --date 2026-05-06   # writes agent-state/morning-report-20260506.md
```

The Producer also auto-emits this every 6 hours during `producer run`. Sections: shipped tasks by agent, escalations awaiting sign-off, hard-failed tasks, per-agent spend with `[CAP HIT]` / `(soft cap)` markers, assets registered, anomalies (kill switch armed, stuck-in-progress, near-soft-cap).

### Idle detection

After `idle_threshold_polls` consecutive empty polls, the Producer writes `agent-state/needs-attention.md` and pushes via ntfy.sh (if configured). When work resumes, a "back online" notification follows. Useful when an agent quietly stops filing downstream tasks.

### Web UI

```bash
producer ui                # default: hunts for a free port starting at 5050
producer ui --port 7000    # bind to a specific port (no hunting)
producer ui --no-browser   # don't auto-open the browser
```

A local Flask dashboard at `http://localhost:5050` (or the next free port) showing the queue, recent audit events, ntfy notifications, and budget. File tasks via a form, requeue/kill individual tasks, arm or disarm the kill switch, trigger a morning report. Polls `/api/state` every 2 seconds for a pseudo-live feel.

Bound to `127.0.0.1` only — no auth, by design. It's a local tool for the local user. Multiple producer instances can run side-by-side; each `producer ui` picks the next free port (5050, 5051, 5052, ...). Theming is driven by CSS custom properties in `producer/web/static/app.css` — re-theming is a single block override.

The ntfy panel only appears when `notifications.ntfy_topic` is set; it mirrors what the producer would push to your phone.

### Subscription auth

`ai-producer` accepts two auth modes:

- **API auth (default).** Set `ANTHROPIC_API_KEY` (or put it in `.env`).
  The SDK reports actual per-call cost; the dispatcher records it in
  `tasks.actual_cost_usd` alongside a `tasks.cost_usd` value computed from
  token counts × the bundled `producer/pricing.json`.
- **Subscription auth (fallback).** With no `ANTHROPIC_API_KEY` and a local
  Claude Code install (`~/.claude/.credentials.json` present), the invoker
  symlinks just the credentials file into its isolated `CLAUDE_CONFIG_DIR`
  — plugins, hooks, and skills are *not* inherited. Anthropic does not
  surface a per-call dollar figure under subscription auth, so
  `tasks.actual_cost_usd` is `NULL`; `tasks.cost_usd` (computed) is still
  populated for budget enforcement and reporting.

The pricing table in `producer/pricing.json` is bundled with the package.
Refresh it when Anthropic adjusts rates:

```bash
producer pricing update             # fetch, validate, write
producer pricing update --dry-run   # preview the diff
```

`producer run` warns once at startup if the table is more than 90 days old
(via `agent-state/needs-attention.md` and the morning report banner).

#### Inherit mode (OAuth / Keychain auth)

If you authed Claude Code via `claude login` on a recent macOS install, your
credentials live in the Keychain — not in `~/.claude/.credentials.json`. The
isolated subscription-auth fallback can't see them. Use **inherit mode**:

```yaml
auth:
  mode: inherit   # or set PRODUCER_AUTH_MODE=inherit
```

In inherit mode the dispatched agent inherits your host `~/.claude` config
directly. Plugins, hooks, and skills you have installed will load. To prevent
prompt-driven hangs, ai-producer automatically uses `bypassPermissions`
and sets `GIT_TERMINAL_PROMPT=0`.

If you leave `auth.mode: auto` (the default), ai-producer detects this
case and falls back to inherit mode automatically with a warning.

## Writing an agent prompt

Each role's `system_prompt.md` should follow this shape (see `producer/templates/agent_prompt_template.md`):

1. **Mission** — what this role produces; boundaries with adjacent roles.
2. **What you must read first** — the task, related specs, existing code idioms.
3. **Hard rules** — language version, lint rules, layout conventions, the human gates this role escalates instead of touching.
4. **Output discipline** — one deliverable per dispatch, structural conventions.
5. **Ledger discipline** — `LedgerClient.record_decision(...)` snippet for every dispatch.
6. **Queue self-replenishment (REQUIRED)** — `LedgerClient.insert_task(...)` snippet showing what downstream task(s) to file. **This is the keystone — without it the queue stalls and the Producer goes idle.**
7. **Stop condition** — explicit "stop after the deliverable lands and the ledger is updated."

The Producer's dispatch prompt sits ABOVE the system prompt and contains task ID, title, description, payload, and a short operating contract. The role-specific system prompt provides the standing instructions that apply to every dispatch of that role.

## Public API

```python
from producer import (
    LedgerClient, Task, AuditLogger, TaskClaimRace, LedgerError,
    BudgetTracker, BudgetExceeded,
    KillSwitchWatcher,
    ClaudeAgentInvoker, InvocationResult,
    ProducerConfig, RoleConfig, BudgetConfig, NotificationsConfig, load_config,
    Producer,
    AUDIT_GENESIS_PREV_HASH, new_id, utc_iso_now, utc_today,
)
```

`LedgerClient` is the most useful surface for hand-rolled scripts:

```python
from producer import LedgerClient

ledger = LedgerClient(db_path="agent-state/ledger.db")

# Add a task from an agent's body
ledger.insert_task(
    agent_owner="qa",
    title="Validate the new etching",
    description="Per spec X §3, run a focused balance pass.",
    payload={"feature": "etching_drip"},
    priority=4,
)

# Record a decision
ledger.record_decision(
    task_id="01HYK...",
    agent_owner="code",
    kind="implementation",
    decision_text="Implemented Y at path Z",
    rationale="Per spec §2.",
)

ledger.close()
```

## Limitations

- **Not designed for >1k tasks/hour.** The audit chain becomes a single-writer bottleneck under that load. If you hit it, shard via per-agent audit tables with a chain-of-chains roll-up — the planning notes in the source repo discuss this.
- **One Producer process per project.** Two Producers polling the same ledger is technically safe (the optimistic-lock claim handles it), but you'd be wasting work — pick one.
- **No web UI / dashboard.** Use [Datasette](https://datasette.io) or `sqlite3` directly to inspect the ledger; the morning report is the human-facing summary.
- **No multi-tenancy.** One project per ledger.

## Development

```bash
git clone https://github.com/<your-fork>/ai-producer
cd ai-producer
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest -v
```

114 tests covering: ledger CRUD + audit chain integrity, budget pre-dispatch reservation + cap enforcement, kill switch, invoker isolation + no-productive-output detection, dispatch with retry/escalate, idle detection, morning report rendering, CLI subcommands, interactive init scaffolding, and an end-to-end smoke with a stub echo agent.

## Acknowledgments

Extracted from [Bastzee](https://github.com/capt-pancakes/Bastzee), where the original `tools/agent_runtime.py` and `tools/producer.py` evolved through a real-world fully-agentic build. The architectural decisions — SQLite as the bus, agents own their downstream filing, file-based kill switch, file-marker + ntfy notifications — were forged there.

## License

MIT — see `LICENSE`.
