Metadata-Version: 2.4
Name: marcus-mini
Version: 0.1.1
Summary: Board-mediated multi-agent coordination in ~500 lines of Python
Author-email: Larry Gray <lwgray@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/lwgray/marcus-mini
Project-URL: Repository, https://github.com/lwgray/marcus-mini
Project-URL: Issues, https://github.com/lwgray/marcus-mini/issues
Project-URL: Changelog, https://github.com/lwgray/marcus-mini/releases
Keywords: multi-agent,ai,coordination,llm,claude,kanban,automation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: anthropic>=0.40.0
Requires-Dist: click>=8.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: mypy>=1.8; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: isort>=5.12; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="assets/logo.png" alt="marcus-mini" width="220">
</p>

<p align="center">
  <strong>Board-mediated multi-agent coordination in ~500 lines of Python.</strong>
</p>

<p align="center">
  <em>Multiple AI agents work in parallel on a shared task board.<br>
  They never talk to each other — they coordinate exclusively through SQLite.</em>
</p>

<p align="center">
  <a href="https://pypi.org/project/marcus-mini/"><img alt="PyPI" src="https://img.shields.io/pypi/v/marcus-mini?color=blue"></a>
  <a href="https://pypi.org/project/marcus-mini/"><img alt="Python" src="https://img.shields.io/pypi/pyversions/marcus-mini"></a>
  <a href="https://github.com/lwgray/marcus-mini/actions/workflows/ci.yml"><img alt="CI" src="https://github.com/lwgray/marcus-mini/actions/workflows/ci.yml/badge.svg"></a>
  <a href="LICENSE"><img alt="License" src="https://img.shields.io/badge/license-MIT-yellow"></a>
</p>

<p align="center">
  <a href="#quickstart">Quickstart</a> ·
  <a href="#commands">Commands</a> ·
  <a href="#how-it-works">How it works</a> ·
  <a href="#the-evolution-marcus">Marcus</a>
</p>

---

## What it looks like

```
$ mini build "a snake game in Python"

  Decomposing goal → 8 tasks
  Spawning 3 agents (DAG recommends 3) in tmux session 'marcus-snake-game-1146'
    Spawned agent-1 (pane 0)
    Spawned agent-2 (pane 1)
    Spawned agent-3 (pane 2)

  ✓ 3 agents running. mini watch to follow along.
```

`mini dag` shows the decomposed work as it's running:

```
  ╭────────────────────╮
  │ project structure  │
  ╰────────────────────╯
            │
  ┌─────────┴─────────────┐
  ▼                       ▼
╭────────────────────╮  ╭────────────────────╮
│  game state model  │  │   render engine    │
╰────────────────────╯  ╰────────────────────╯
            │                       │
            └────────┬──────────────┘
                     ▼
          ╭────────────────────╮
          │   collision logic  │
          ╰────────────────────╯
                     │
                     ▼
          ╭────────────────────╮
          │     game loop      │
          ╰────────────────────╯
```

`mini bench` measures coordination overhead after the run completes:

```
Bench — snake-game-1146
─────────────────────────────────────
  Wall time         8m 34s
  Agent work        18m 12s
  Utilization       70.8%
  Coordination tax  29.2%
  Tasks             8 (8 done)
```

---

## The idea

Most multi-agent frameworks let agents talk to each other directly. `marcus-mini`
does the opposite: agents are **blind to each other** and coordinate only through
a shared board.

Three invariants hold in every run:

1. **Agents self-select work.** The board assigns nothing — agents pull the next
   available task whose dependencies are all complete.
2. **Agents make all implementation decisions.** The board says *what* to build,
   never *how*.
3. **Agents communicate only through the board.** Artifacts and decisions logged
   to a task become context for downstream agents. No direct messages, no shared
   memory outside the board.

This mirrors how distributed teams actually work: a shared backlog, async
handoffs, no mandatory standups.

---

## Quickstart

**Requirements:** Python 3.11+, [Claude Code CLI](https://claude.ai/code), tmux

```bash
pip install marcus-mini
export MINI_API_KEY=sk-ant-...   # your Anthropic key — for decomposition only
mini build "a snake game in Python"
```

> **Why `MINI_API_KEY` and not `ANTHROPIC_API_KEY`?**
> Claude Code agents inherit your shell environment. If `ANTHROPIC_API_KEY` is set,
> agents use it for every API call — even if you have a Claude subscription — and
> you get charged. `MINI_API_KEY` is only read by the decomposer (one call per
> `mini build`). Agents run under your subscription key, not this one.

`mini build` will:
1. Call Claude to decompose the goal into a parallel task DAG
2. Persist all tasks to a local SQLite board
3. Spawn N agents in a tmux session (N = DAG width)
4. Each agent loops: claim task → implement → log artifacts → mark done

Watch the board live:

```bash
mini watch          # live-refreshing kanban
mini status         # per-agent activity
mini bench          # coordination metrics after completion
```

---

## Install

```bash
pip install marcus-mini
```

Or from source:

```bash
git clone https://github.com/lwgray/marcus-mini
cd marcus-mini
pip install -e .
```

---

## Commands

### Build & monitor

| Command | Description |
|---|---|
| `mini build "goal"` | Decompose goal, spawn agents |
| `mini watch` | Live kanban board (auto-exits on completion) |
| `mini board` | Static kanban snapshot |
| `mini status` | Per-agent activity with staleness warnings |
| `mini dag` | ASCII dependency graph |
| `mini tasks` | Task list with dependency info |
| `mini progress` | Completion percentage |
| `mini time` | Project elapsed time |
| `mini logs` | Tail agent log files |
| `mini wait` | Block until all tasks reach DONE/FAILED (scripting-friendly) |

### Measure

| Command | Description |
|---|---|
| `mini bench` | Wall time, utilization, coordination tax |

### Manage projects

| Command | Description |
|---|---|
| `mini projects` | Running projects (`--all` to see completed too) |
| `mini load` | Total live agents across all running projects |
| `mini open` | Print output directory (`cd $(mini open)`) |
| `mini stop` | Kill the current project's agents (DB + tmux) |
| `mini stop --all` | Stop every running project (type STOP to confirm) |
| `mini purge` | Wipe every project from the DB (type DELETE to confirm) |
| `mini config` | View/set API key env var |

### Common flags

```bash
mini build "goal" --agents 4          # override agent count
mini build "goal" --output-dir ~/myproject
mini watch --interval 5               # refresh every 5s
mini bench --project my-project-1200
mini wait --interval 30 --timeout 5400  # 90-min ceiling
```

---

## How it works

```
mini build "snake game"
        │
        ▼
  ┌─────────────┐
  │  decomposer │  Claude call → flat task list + dependency DAG
  └──────┬──────┘
         │
         ▼
  ┌─────────────┐
  │    board    │  SQLite (WAL mode) — the shared environment
  └──────┬──────┘
         │  MCP server exposes 5 tools to each agent
         │
    ┌────┴────┐
    │         │
  agent-1   agent-2   ...   agent-N     (tmux panes)
    │         │
    └─────────┘
    read/write the same board, never each other
```

### Board MCP tools (what agents see)

| Tool | Purpose |
|---|---|
| `request_next_task` | Claim next task whose deps are all DONE |
| `log_artifact` | Store output (API spec, schema, file path) |
| `log_decision` | Record an architectural choice |
| `get_task_context` | Read artifacts from dependency tasks |
| `report_done` | Mark task complete, unblock dependents |

### Task assignment (atomic SQL)

A task is claimable when `status = 'TODO'` and every dependency has
`status = 'DONE'`. The claim runs inside `BEGIN EXCLUSIVE` with a
`json_each()` dependency check — no race conditions, no external lock manager.

### Coordination tax

`mini bench` measures the gap between theoretical and actual parallelism:

```
agent utilization  =  total task work  /  (n_agents × wall_time)
coordination tax   =  1 − utilization
```

A tax of 0 % means every agent was always working. Typical software projects
land at 30–60 % because of the critical path.

---

## Project structure

```
marcus-mini/
├── marcus_mini/
│   ├── board.py          # SQLite board + async API
│   ├── board_server.py   # MCP server (5 tools)
│   ├── cli.py            # mini CLI (Click)
│   ├── decomposer.py     # Claude → task DAG
│   ├── models.py         # Task dataclass
│   ├── monitor.py        # tmux monitor pane
│   └── spawn.py          # tmux agent spawner
├── prompts/
│   └── agent_prompt.md   # agent loop instructions
├── tests/
└── pyproject.toml
```

---

## Design discipline: the Mini Red Line

`marcus-mini` is intentionally small. Every proposed feature has to pass one test:

> **"Does coordination break without this?"**
> Yes → allowed. No → it doesn't belong in mini.

What's ruled out, even if it would be useful:

- Observability beyond `mini status` (no dashboards, metrics pipelines)
- Resilience infrastructure (no retry, circuit breakers, fallbacks)
- Rich configuration (one flat JSON file, period)
- External integrations (no Slack, GitHub, webhooks)
- Agent capability management (no skills, tools, specializations)
- Scheduling (no cron, recurring tasks)

What stays, even if it adds lines of code:

- Correctness fixes (stall detection, accurate liveness checks)
- Coordination primitives (task claiming, dependency resolution, spawning)
- Measurement (`mini bench`, timing, coordination tax)

If a feature would be at home in [Marcus](https://github.com/lwgray/marcus),
it doesn't belong in mini.

---

## The evolution: Marcus

`marcus-mini` proves the concept on one stack: SQLite, Claude, tmux. Three
moving parts, ~500 lines of coordination, intentionally crippled where Marcus
shines.

[**Marcus**](https://github.com/lwgray/marcus) takes the same board-mediated
primitives and turns them into a production platform. Here is what Marcus does
that mini cannot:

| Capability | mini | Marcus |
|---|---|---|
| **Contract-first decomposition** — agents agree on APIs/schemas before any code is written, so coordination works in domains without code (legal, scientific, design) | flat task DAG only | full contract synthesis pre-fork |
| **Observability** — structured run logs, experiment tracking, cross-run comparison, dashboards | `mini status` + `mini bench` | full telemetry pipeline + Cato dashboard |
| **Resilience** — agent retry, circuit breakers, error taxonomy, automatic stall recovery, lease-based work claiming | none (fails loud) | error framework + lease/recovery layer |
| **Provider abstraction** — board protocol independent of which LLM or which agent runtime | Claude + Claude Code only | multi-provider board protocol |
| **Domain extensibility** — coordinate non-software work via contract templates | software builds only | research, ops, content, analysis |
| **Multi-team / multi-project** — concurrent boards, RBAC, shared artifact registry | single user, single board | team and tenant isolation |
| **Configuration** — environments, profiles, per-agent tuning | one flat JSON | hierarchical config + per-environment overrides |
| **Scheduling** — recurring tasks, cron, triggered runs | one build at a time | full scheduler |

In short: **mini is the experiment. Marcus is the platform.** If you find
yourself wanting any row in the right column, you have already outgrown mini —
graduate to Marcus.

### Known scope limitations of mini

- **Claude-only.** The decomposer uses the Anthropic SDK directly, and agents
  are Claude Code processes. There is no provider abstraction. This is a
  deliberate scope choice for a research instrument — Marcus handles
  multi-provider.
- **Single user, single key.** No multi-tenancy, no RBAC, no team workflows.
- **Local only.** SQLite + tmux on one machine. No remote agents, no cluster.
- **Best effort failure handling.** If an agent dies or the API quota runs out,
  mini stops loud — there is no auto-recovery. By design.

---

## Why not CrewAI / AutoGen / LangGraph?

Those frameworks route messages between agents. `marcus-mini` does not. Agents
in this system are genuinely autonomous — they pull work, they decide how to do
it, they post results. The board is the only channel.

This means: parallelism is structural (DAG-derived), not programmed. You don't
write agent communication logic. You write a goal, and the board handles the rest.

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Bug reports and PRs welcome — but new
features must pass the Red Line test above.

---

## License

[MIT](LICENSE)
