Metadata-Version: 2.4
Name: wonderland-ai
Version: 0.9.0
Summary: An identity-native multi-agent system that runs the full software development lifecycle.
Project-URL: Homepage, https://github.com/KohlJary/wonderland-ai
Project-URL: Repository, https://github.com/KohlJary/wonderland-ai
Project-URL: Issues, https://github.com/KohlJary/wonderland-ai/issues
Project-URL: Documentation, https://github.com/KohlJary/wonderland-ai/blob/main/WONDERLAND_SPEC.md
Author-email: Daedalus <kohlbern@gmail.com>
License: MIT
License-File: LICENSE
Keywords: agents,anthropic,claude,identity,multi-agent
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: aiosqlite>=0.20
Requires-Dist: anthropic>=0.40
Requires-Dist: platformdirs>=4.0
Requires-Dist: plotext>=5.3.2
Requires-Dist: pydantic>=2.10
Requires-Dist: python-ulid>=3.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: textual>=8.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8.3; extra == 'dev'
Requires-Dist: redis>=5.2; extra == 'dev'
Requires-Dist: ruff>=0.8; extra == 'dev'
Provides-Extra: redis
Requires-Dist: redis>=5.2; extra == 'redis'
Description-Content-Type: text/markdown

# Wonderland

**An identity-native multi-agent system that runs the full software development lifecycle — discovery → planning → design → implementation → verification — as a cast of characters who carry the project across sessions.**

> Generic AI agents perform roles. Identity-native agents inhabit them.

![Wonderland dashboard](img/dash.png)

> ### ▶ See it in action
>
> [**Mvp-demo2 — first end-to-end Tier 2 autonomous pilot**](./src/wonderland/closet/analyses/034-mvp-demo2-autonomous-pilot.md).
> A full-stack markdown notebook produced autonomously: discovery
> → milestone-plan → 3 design+implement+verify cycles. **$83.78**
> total spend, **5000 lines** of code (~3400 application + ~1600
> test), **61 passing tests**, SQL-injection-safe handlers, custom
> React hooks, modular components. The operator was a gate-approver
> — pressed buttons at milestone-close gates, skipped a few
> duplicate features, watched the team build. Zero killed runs,
> zero hand-edited artifacts, one mid-pilot substrate fix.
>
> If you're trying to figure out what this project actually *is*,
> start there. (For the earlier single-directive demo of one-shot
> code generation, see [the Geocities pilot](./SHOWCASE.md).)

Wonderland is a cast of agents — each named after an Alice-in-Wonderland
character — that collaborate across the entire software production pipeline.
The Cheshire Cat is the architect. The White Rabbit is the project manager.
The Mad Hatter is QA. Alice inhabits personas to write stories the team
designs against. The Tweedles ship code; the Caterpillar reviews it. Every
character has a stable self-model (a "constitution"), persistent per-agent
memory, and a working relationship with the others.

The earlier framing was "multi-agent development system" — a fair description
when the entry surface was `directive in → features out`. The project has
since grown into something closer to **an end-to-end emulation of how a small
software team actually produces shipping product**: a discovery phase that
interviews the operator about personas and scope, a planning phase that
organizes the captured requirements into sequenced milestones, then
per-milestone design and implementation passes that close coverage loops the
substrate verifies. The Wonderland-flavored part is unchanged — identity
still does the work — but the surface the operator interacts with is now the
flow, not the meeting.

The architectural claim is that **identity does real work**. An agent with a
constitution it inhabits across many threads behaves differently from an
agent reconstructed from a system prompt each turn. It accumulates judgment.
It develops calibrated views of its colleagues. It refuses to cross domain
boundaries because the boundary is part of who it is, not a policy applied
from outside. Whether that compounds into measurably better outcomes than a
generic-agents baseline is what the eval harness in P7 will measure; until
then, the [`analyses/`](./analyses) directory tracks the qualitative
observations as the system gets built out.

Five corollaries follow from this, each anchored in field analyses:

1. **Identity-based architecture lets smaller models outperform their
   expected capabilities.** The constitution carries the discipline a
   generic agent would have to invent turn-by-turn, so a small model
   acting in character can hold its own against a large model with a
   generic prompt. Default target is Claude Haiku 4.5; the falsifier
   is P7's generic-baseline-vs-identity-native eval.
   ([analysis 004](./analyses/004-first-race.md))
2. **Failure modes are part of identity.** Each constitution's §VIII
   names the specific shadow each virtue decays into — the
   Sephirah/Qlipha pairing, where the corruption is structural rather
   than additive. Agents course-correct from inside instead of waiting
   for an external guardrail.
   ([analysis 004](./analyses/004-first-race.md))
3. **Character-shaped agents degrade visibly rather than silently when
   parts of the system fail.** When the bus dropped feature-composition
   artifacts mid-run, the Tweedles noticed the contradiction and
   reached for the disk-resident artifacts via their tools — recovery
   was emergent, not designed.
   ([analysis 027](./analyses/027-pomodoro-degradation-and-event-leak.md))
4. **The team produces a small-team shape, including things the
   directive never asked for** — ADRs with named tradeoffs,
   persona-grounded specs, accessibility coverage that wasn't
   requested. Production shape as a derived property of constitutional
   grounding.
   ([analyses 034](./analyses/034-tdd-serial-phased-first-run.md),
   [035](./analyses/035-tdd-phased-teams-2hg-first-run.md))
5. **Friction is the substrate, not the inefficiency.** Every meeting
   in the workflow is engineered friction with a specific shape; §VIII
   puts that friction inside each constitution. Generic stacks have
   nothing analogous because they have roles, not characters; goals,
   not voices; consensus, not constitutions.

Full argument with the per-corollary mechanics in
[**THESIS.md**](./THESIS.md).

The framing the project is building around: *failures are how software
gets built.* The iterative cycle of ship-then-discover-then-fix depends
on recognizing what went wrong; agents whose failure modes are part of
their identity can participate in that cycle as colleagues, not as tools
that need supervising out of their bad habits.

## How it works end-to-end

The operator's flow through Wonderland mirrors the four-phase
arc of an actual software project, with each phase grounded in
the previous one's artifacts:

1. **Discovery** — Alice, Cheshire Cat, and White Rabbit each run
   a short interview (personas, technical constraints, scope +
   success criteria). The substrate writes the answers to disk as
   structured ``requirement`` artifacts. The whole loop is ~12
   minutes of operator attention; every later workflow seeds from
   this corpus rather than re-asking what the project is for.
2. **Planning** — the ``milestone-plan`` workflow organizes the
   captured requirements into 3-7 ordered milestones, each
   declaring ``consumes_requirements`` + ``done_when`` criteria.
   A substrate-level coverage check runs at end of each rotation:
   any decomposable requirement not assigned to a milestone fires
   a synthetic Dodo observation nudging the team to revise.
3. **Design (per milestone)** — ``tdd-design --milestone <slug>``
   composes stories from the milestone's requirements (M1), turns
   them into features (M2), decomposes features into tickets (M3),
   negotiates architecture (M4) and per-feature contracts (M5).
   M2 runs its own coverage check verifying every milestone
   requirement is realized by a feature before the meeting closes.
4. **Implementation (per ticket)** — ``tdd-implement`` opens
   per-ticket meetings where the Tweedles write red tests, ship
   code that turns them green, and the Caterpillar reviews the
   diff. The working tree IS the implementation artifact; review
   is against `git diff`, not a parallel metadata utterance.

Cross-cutting through all four phases: agents can ``retract``
artifacts they shipped earlier when they realize they drifted
off-scope. A workflow-level kill-list blocks speech_acts from
leaking between phases (e.g., a stray ``milestone_plan`` utterance
during ``tdd-design`` is silently stripped). The project
dashboard derives the current phase from disk and surfaces the
next-recommended workflow as a one-line CTA.

See [`projects/discovery2/.wonderland`](./projects/discovery2)
(or any of the discovery* projects) for the actual artifact
shapes the team produces.

## Status

In-progress, building in public. The vertical slice through
discovery → planning → design lands in 0.6.0; implementation
through a full milestone is the next demo target. P7 (evals)
will close out the phase numbering once the lifecycle stabilizes.

- [x] **P1 — Foundation** ([overview](./WONDERLAND_SPEC.md#12-build-order))
      Schema, Caucus event bus, episodic memory, agent base class.
- [x] **P2 — First Voice** ([analysis 001](./analyses/001-first-voice.md))
      The Cheshire Cat comes online and produces ADRs in his own voice.
- [x] **P3 — First Tension** ([analysis 002](./analyses/002-first-tension.md))
      White Rabbit joins the bus. Semantic + relational memory layers.
      Compaction-as-agent-behavior — agents reflect between threads.
- [x] **P4 — First Race** ([analysis 003](./analyses/003-alice-alone.md) ·
      [analysis 004](./analyses/004-first-race.md))
      Alice agent. Dodo orchestrator with quiescence detection. Conflict
      resolution, composition, and human-in-the-loop escalation. Showcase 1:
      a `/health` endpoint directive runs end-to-end against live Haiku 4.5,
      reaching settlement because the team goes quiet — no human in the loop.
- [x] **P5 — Full Cast**
      The remaining six characters land: Mad Hatter (test scenarios),
      Caterpillar (code review), Queen of Hearts (security/compliance
      rulings), Dormouse (production observations), and the Tweedles
      (frontend + backend implementation, with a shared pair protocol).
      `wonderland init` CLI ships the `.wonderland/` skeleton. Synthetic-
      consensus guard observes the bus for the §11 anti-pattern — three
      or more distinct constitutional domains converging on the same
      position is suspect, and the guard surfaces it for review.
- [x] **P6 — Real Threads**
      The hard showcases plus the substrate maturation that made them
      work. **Showcase 2: translation chat MVP** ([analyses
      015](./analyses/015-tweedles-ship-real-code.md)–[018](./analyses/018-the-breakthrough.md))
      — 1580 lines shipped from a vague directive in $0.93. **Showcase 3:
      security recovery** ([analysis 019](./analyses/019-security-recovery.md))
      — reactive response to a synthesized credential-stuffing
      incident; the framework's first non-greenfield run. **Showcase 4:
      multi-session persistence** ([analyses
      020](./analyses/020-multi-session-persistence.md)–[023](./analyses/023-quiescence-and-split-phases.md))
      — memory compounding across sessions, plus the substrate fixes that
      closed the "Tweedles don't ship" bug class (turn-based quiescence
      replacing wall-clock, parse-retry recovering malformed LLM
      responses, working-tree-as-implementation-artifact). **Workflow-as-
      data extraction** ([analysis 024](./analyses/024-tdd-validated.md)):
      meeting chains live in `closet/workflows/` as YAML; the TDD variant
      becomes the recommended default for feature work, with canonical
      retained for fast iteration when directive drift is unlikely. **The
      pair protocol pays off** in
      [analysis 025](./analyses/025-tea-party-validated.md): Alice and the
      Mad Hatter are paired in M4 (the tea party — they're already paired
      in the source material) so Alice writes user-journey scenarios and
      Hatter writes failure-mode scenarios, each producing both the
      artifact and a runnable pytest file. The Geocities showcase ships
      1841 lines from a vague directive in 7m 38s for $2.05, with
      Caterpillar catching three production-blocking bugs in the diff —
      the framework's bug-discovery surface working as designed. See
      [`SHOWCASE.md`](./SHOWCASE.md) for the friend-facing tour and
      [`analyses/`](./analyses) for the full build log of the iteration
      that got the framework here.
- [x] **P8.1 — Observer API** & **P8.2 — TUI Inspector** *(0.0.1 release)*
      `HistoricalRunHandle` reads any snapshot directory; the
      Textual TUI ships as a read-only run inspector with
      lazygit-style multi-pane drill-down (snapshot library →
      run summary → meeting detail → utterance modal →
      artifact browser, plus Cast view + theme cycling).
- [x] **P8.3 — Streaming + Mock Turtle**
      `RunHandle.stream_events()` async-iterator interface +
      `MockTurtleHandle` that replays a snapshot at compressed
      clock time. The testbed P8.4's live-watch screen iterates
      against without API spend.
- [x] **P8.4 — Live-watch screen**
      The streaming surface goes visual: meetings ribbon fills
      in as `MeetingStarted` events arrive, transcript scrolls
      live, body preview tracks the utterance cursor, artifacts
      pane filters per-meeting. All three panes focusable, Tab
      cycles, lazygit principle throughout. Iterated entirely
      against Mock Turtle replay.
- [x] **P8.5 — Directive issuing + LiveRunHandle** *(this release)*
      `NewRunScreen` is the directive composer (preset picker
      with bundled `pomodoro` / `hello-endpoint` /
      `translation-chat` / `geocities` / `ping` directives, plus
      per-project saves to `.wonderland/directives/`); selection
      drives composer + workflow pre-fill; Enter steps through
      the form like a paper form. `LaunchConfirmationScreen`
      guards the irreversible spend with directive preview +
      soft-cap budget. `LiveRunHandle` wraps a real Runner+Caucus
      and emits through the same streaming protocol Mock Turtle
      uses, so the live-watch screen consumes a real run
      interchangeably with a replay. Settings screen accepts
      API key + model from inside the TUI — fresh `pip install`
      users no longer drop to the shell to write a config file.
      First end-to-end TUI run shipped a story for $0.0119
      against the smoke workflow.
- [x] **P8.6 — New-project spinup**
      Skeleton picker + stack-detection routine for adopting
      existing non-Wonderland projects. Closes the on-ramp from
      "I have a project idea" to "team's ready to start" without
      leaving the TUI.
- [x] **P8.7 — First-use polish**
      Welcome screen, error states, abort flow, quiescence
      indicator.
- [x] **P11 — Projects: first-class project model + dashboard**
      ``Project`` registered in a JSON registry at
      ``~/.wonderland/projects.json``; per-project dashboard
      becomes the operator's primary attention surface.
      ``project.yaml`` carries stack-as-runtime-fact the team
      consults at every meeting.
- [x] **P12 — Feature lifecycle + workflow composability**
      Split design / implement / verify into separate workflow
      atoms. Per-feature lifecycle states (proposed → in_design
      → designed → queued → in_progress → ready_review →
      verified). Operator batches: queue several features in
      designed state, run ``tdd-implement`` against the queue.
- [x] **P13 — Cross-run stability**
      Iterating ``tdd-design`` + ``tdd-implement`` against the
      same project becomes the expected mode of operation. Seed
      fallback reads prior runs' artifacts off disk; emission
      transitions fire per-utterance so dashboard state tracks
      reality without backfill races.
- [x] **P14 — Discovery**
      Requirements gathering as a first-class pre-design phase.
      Three-interview flow (Alice → personas / situations,
      Cheshire Cat → constraints / integrations, White Rabbit →
      scope / success criteria). Disk-mediated operator bridge
      so interviews don't burn rotation budget while operators
      think. Wall-clock unbounded.
- [x] **P15 — Milestones**
      ``milestone-plan`` workflow organizes captured requirements
      into a sequenced milestone trajectory. ``tdd-design
      --milestone <slug>`` scopes design to one milestone. Two
      closed-loop coverage checks: (a) every decomposable
      requirement lands in a milestone, (b) every milestone
      requirement is realized by a feature. Retract primitive
      lets agents walk back off-scope artifacts; workflow-level
      kill-list blocks cross-workflow speech_act leakage.
      Dashboard surfaces the derived lifecycle phase + the
      next-recommended workflow as a one-line CTA.
- [ ] **P7 — Evals**
      Generic-baseline vs Wonderland comparison. The compounding
      curve. Reordered after the lifecycle stabilizes so the
      eval harness has a usable end-to-end surface.

[`WONDERLAND_SPEC.md`](./WONDERLAND_SPEC.md) is the design document.
[`constitutions/`](./constitutions) holds each character's identity in plain
text — these are the actual identity artifacts the runtime loads.

## Try it

Two demo scripts run live against the Anthropic API. You'll need an API
key (see *Configuration* below).

```bash
# A single Cheshire Cat reflecting on a directive
uv run python scripts/cat_demo.py

# Cat + Rabbit on the same bus, with optional compaction afterward
uv run python scripts/two_agent_demo.py --compact
```

Both scripts publish a translation-chat directive by default; pass
`--directive "..."` to use your own.

## The TUI

`wonderland-tui` is the operator interface. Register a project,
queue features for the team, watch them work in real time, verify
or reject what they ship. The same screen that renders live runs
also replays past ones at compressed clock time, so iterating on
the UX never costs API tokens.

```bash
pip install wonderland-ai
wonderland-tui                       # opens the project library
```

First-run flow: the library opens empty. Open `Settings`, paste an
Anthropic API key (saved to your platform's user-config dir), back
out. Press `n` to create a project — pick a path, pick a skeleton
(`python-tui`, `python-cli`, `python-fastapi`, `react-vite`,
`fullstack-fastapi-react`), optionally seed the project's prime
directive from a demo preset (squathero, pomodoro, geocities, …).
On confirm, the TUI offers to launch the discovery workflow
immediately — yes lands you on the live-watch screen where the
first interview modal pops up; later just opens the project
dashboard. ``project.yaml`` carrying the stack-as-runtime-fact
gets written either way so the team has runtime context at every
meeting.

After discovery completes, the dashboard's phase badge reads
``PLANNING — N requirements captured`` with a "Run milestone-plan"
button right above the project context. After planning, the badge
reads ``DESIGN — Milestone N (M of K designed)`` with a "Design
milestone: <slug>" button at the bottom of the milestone detail
pane. The substrate computes the phase from disk every refresh,
so an operator returning to a project after a week sees what's
next without re-orienting.

The screens, in the order an operator typically meets them:

- **Project library** — your projects with metadata. `n` for new,
  Enter to open the dashboard, `s` for settings.
- **New project** — name, path, prime directive (with a demo
  picker that populates the composer from bundled directives),
  skeleton picker. On submit the TUI offers to jump straight
  into the discovery workflow — the natural first move on a
  fresh project. Skeleton apply lays down a working scaffold AND
  writes `project.yaml` so M4 architecture and M5 contracts
  ground in the runtime fact, not just the directive's prose.
  Existing non-bare projects get a retrofit path that writes
  `project.yaml` without clobbering existing files.
- **Per-project dashboard** — the operator's primary attention
  surface. Phase badge at the top derives the project's current
  state (`DISCOVERY` / `PLANNING` / `DESIGN — Milestone N` /
  `IMPLEMENTATION` / `COMPLETE`) from disk + names the next
  recommended workflow. Below: Milestones pane (left) with a
  collapsible tree of milestones + their consumed requirements +
  a Cross-cutting node for personas / situations / out-of-scope
  reqs that don't belong to any specific milestone. Selecting
  a milestone filters the Features pane (right) to features
  that realize that milestone's requirements via the
  story → feature chain. Feature pane has state filter chips
  (designed / queued / ready_review / in_progress / verified /
  rejected) that stack with the milestone filter. Runs row sits
  below as the always-visible history reference. State-aware
  action buttons (`Design`, `Implement`, `Verify`, `Custom run`)
  surface counts for what's actionable.
- **Lifecycle moves from the dashboard** — `q` queues a designed
  feature for implementation; `Verify` opens a modal that captures
  the operator's verdict with optional notes (verified / rejected
  → recorded in `.wonderland/feature-states.jsonl` for next-run
  context); `m`/`D` mark and bulk-delete duplicate tickets when
  Rabbit's M3 ships revision-pass redundancy. The Milestone
  detail pane offers a "Design milestone: <slug>" CTA at the
  bottom when the highlighted milestone has zero realizing
  features yet — operator-driven path into ``tdd-design`` with
  the scope pre-filled.
- **New run composer** — preset picker (left) + directive editor
  (right) + workflow / budget / project-root config + inline
  save-as-preset form. Bundled directives: `pomodoro`,
  `hello-endpoint`, `translation-chat`, `geocities`, `ping`.
  Per-project presets live at `<project>/.wonderland/directives/`.
  Empty directives push a confirmation modal so a launch doesn't
  silently ship without intent.
- **Live-watch screen** — three focusable panes (lazygit-style):
  meetings ribbon (with per_item iteration discriminators for
  parallel and pipeline workflows), transcript table + body
  preview pane, artifacts table. Selection filters across panes;
  status bar shows current speaker, live cost ticker, watching
  elapsed time + source-time elapsed. Same screen consumes live
  runs (`LiveRunHandle`) and replays of captured runs
  (`MockTurtleHandle`) interchangeably.
- **Operator-question modal** — when an agent emits a
  `question_to_operator` (architectural ambiguity contracts can't
  disambiguate, business priority calls, schema-vs-directive
  conflicts), the framework pauses the meeting and surfaces the
  question as a modal. Your reply lands on the bus as an
  `observation` from the operator identity; the meeting resumes
  with the team seeing the answer in their context.
- **Interview modal** — surfaced during discovery workflow when
  Alice / Cheshire Cat / White Rabbit ship their question
  batches. One widget per question kind (free-text TextArea,
  single-choice RadioSet, multi-choice Checkbox group, numeric
  Input). Submit / Skip section / Cancel exits; required fields
  block submit until filled. The substrate writes answers to
  ``.wonderland/runs/<id>/pending_interview_answers.json`` and
  the interviewer's next turn synthesizes them into
  ``requirement`` artifacts. Wall-clock unbounded — interviews
  don't burn rotation budget while the operator's away.
- **Cast view** — single-page lazygit shape: character list at
  top, bio + constitution side-by-side below. Bios cover both
  the literary character and how it shapes each agent's
  constitution. Useful for understanding why an agent made a
  particular call when reviewing a captured run.
- **Settings** — Anthropic API key (password-masked, persists to
  the user-config dir) + optional model override. Reachable from
  the library, also auto-pushed when New run finds no API key
  set so fresh `pip install` users have a one-click recovery
  path.
- **Theme cycling** — `t` rotates through four Wonderland-flavored
  palettes (Tea Party / Looking Glass / Trial / Caucus); built-in
  Textual themes (gruvbox, dracula, nord, …) remain available.
- **Vim navigation** throughout — `j`/`k` to move, `g`/`G` and
  `H`/`L` for top/bottom, `Enter` to drill in / advance, `Tab` to
  cycle focus across panes, `Escape` to back out. Per-screen
  bindings show in the footer.

The replay-first design carries forward: drives the smoke tests,
keeps UX iteration free of API spend, and means anyone curious
about how the framework actually behaves can `wonderland-tui` →
open a project → drill into Runs → press `w` on a snapshot to
watch a captured run play back at 5× speed. Project context, the
features-as-tree dashboard, the verify/reject modal, and the
operator-question pipeline are the P11/P12 additions that pulled
the framework from "watch a run happen" to "drive a project's
feature lifecycle through several runs."

## Project layout

```
wonderland-ai/
├── WONDERLAND_SPEC.md      # The design document
├── THESIS.md               # Long-form thesis (architectural claim + corollaries)
├── constitutions/          # Each character's identity, version-controlled
├── src/wonderland/         # The runtime
│   ├── closet/             # Data the team reaches for at runtime
│   │   ├── skeletons/      # Project skeletons the team builds on top of
│   │   └── workflows/      # Meeting-chain templates (canonical, tdd, smoke)
│   └── ...                 # agent.py, runner.py, caucus.py, workflow.py, ...
├── scripts/                # Demo scripts; workflow_demo.py runs any bundled workflow
├── analyses/               # Field notes on the thesis as it gets stress-tested
├── tests/
└── .daedalus/              # Daedalus' working memory for this project
```

A target project that runs Wonderland gets a `.wonderland/` directory of
its own — per-agent episodic/semantic/relational memory plus the artifacts
the team produces across the lifecycle: requirements (from discovery),
milestones (from planning), stories, features, tickets, ADRs, contract
notes, test scenarios, implementations, reviews. The runtime here is
project-agnostic; per-project state lives with the project.

```bash
wonderland init [path]   # create the .wonderland/ skeleton; idempotent
```

`init` creates `requirements/`, `milestones/`, `stories/`, `features/`,
`tickets/`, `architecture/`, `escalations/`, and `memory/` plus a README
documenting the layout. Re-running is safe — existing artifacts and a
user-edited README are left alone.

## Install

Distribution name on PyPI is `wonderland-ai`; the import path stays
`import wonderland`. Core install includes the TUI (the primary
user-facing surface) and the in-process bus:

```bash
pip install wonderland-ai           # core + TUI
pip install 'wonderland-ai[redis]'  # adds RedisCaucus
```

`RedisCaucus` requires the `redis` extra; constructing one without it
raises `ImportError` with an install hint.

## Configuration

Wonderland reads user-level config (API keys, model overrides) from a
JSON file at the platform-appropriate location:

| OS      | Path                                                          |
|---------|---------------------------------------------------------------|
| Linux   | `~/.config/wonderland/config.json` (honors `XDG_CONFIG_HOME`) |
| macOS   | `~/Library/Application Support/wonderland/config.json`        |
| Windows | `%APPDATA%\wonderland\config.json`                            |

```json
{
  "anthropic": {
    "api_key": "sk-ant-...",
    "model": "claude-haiku-4-5-20251001"
  }
}
```

API-key resolution order: explicit constructor arg → `ANTHROPIC_API_KEY`
env var → config file. The env var wins if set.

## Development

```bash
uv sync --extra dev   # includes redis for full test coverage
uv run pytest
uv run ruff check
uv run ruff format
```

Live LLM tests are gated behind `WONDERLAND_LLM_SMOKE=1` and skipped
otherwise; running them costs Anthropic API tokens. Redis-backed tests
are gated behind `WONDERLAND_REDIS_URL`. To exercise both:

```bash
docker run -d --name wonderland-redis -p 6379:6379 redis:7-alpine
WONDERLAND_REDIS_URL=redis://localhost:6379 \
WONDERLAND_LLM_SMOKE=1 \
  uv run pytest
```

## Sponsoring

Wonderland runs on a personal Anthropic budget — one person, one API
key. The architecture is designed to be cheap (small models, heavy
caching) but multi-agent runs at scale still add up. If any of my
work has been useful to you — to read, build on, or argue with —
[GitHub Sponsors](https://github.com/sponsors/KohlJary) keeps the
Cheshire Cat in tea and the Hatter in scenarios.

## License

[MIT](./LICENSE).
