Metadata-Version: 2.4
Name: neurolisp
Version: 0.35.0
Summary: Sexpr-native MCP server giving Claude Code and other MCP agents white-box workflow orchestration, durable corpus, auto-crystallization.
Author: Bangcheng Wang, NeuroLisp contributors
License: Apache-2.0
Project-URL: Homepage, https://github.com/KevinBangbang/NeuroLisp
Project-URL: Documentation, https://github.com/KevinBangbang/NeuroLisp/blob/main/docs/README.md
Project-URL: Repository, https://github.com/KevinBangbang/NeuroLisp
Project-URL: Changelog, https://github.com/KevinBangbang/NeuroLisp/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/KevinBangbang/NeuroLisp/issues
Keywords: mcp,lisp,agent,orchestration,claude-code,llm,workflow
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Interpreters
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: hypothesis>=6.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Provides-Extra: pdf
Requires-Dist: pypdf>=4.0; extra == "pdf"
Dynamic: license-file

# NeuroLisp

**A white-box MCP server for Claude Code agent workflows.** Survive a `kill -9` mid-task. Read what your subagent did in plain text. Replay any step without rerunning the rest.

[![tests](https://github.com/KevinBangbang/NeuroLisp/actions/workflows/test.yml/badge.svg)](https://github.com/KevinBangbang/NeuroLisp/actions/workflows/test.yml)
[![python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue)](https://www.python.org/)
[![license](https://img.shields.io/badge/license-Apache%202.0-green)](LICENSE)
[![mcp](https://img.shields.io/badge/protocol-MCP-purple)](https://modelcontextprotocol.io/)

---

## Why does this exist

Claude Code is powerful in one session. The 4 things that break in production are paired with what NeuroLisp does about each:

| The pain | NeuroLisp's answer |
|---|---|
| You crash 8 hours in and lose all state. | Every step's cursor, prompt, and result lives in sqlite. Reconnect and `nl_workflow_replay`. |
| You cannot audit what your subagent did. LangGraph objects inspect to `<Node 0x...>`. | Every step is sexpr text in `workflow_runs.steps_sexpr`. Readable with `cat`. |
| Your patterns repeat but the agent forgets each time. | After 3 consistent observations, the pattern auto-crystallizes into a reusable named skill. |
| You want to fix one step without rerunning everything. | `nl_workflow_patch_step <id> <step> <new-prompt>`, then replay from that step only. |

The file you executed is the file you patch. The same sexpr is workflow, plan, template, and macroexpansion result. That is homoiconicity, and the value is concrete: you can read, diff, edit, and replay without ever leaving plain text.

---

## Why smaller models can do bigger work

`CLAUDE.md` and similar instruction files are an **interpretive runtime**. Every step, the model re-reads your rules, decides which skill to load, recalls past context, and improvises what to do next. That requires a very capable model. The cost grows with workflow complexity.

NeuroLisp's sexpr workflow is a **deterministic runtime**. Step order, tool whitelists, lexical scope, retry guards, pre-loaded skills, the briefing kit — all encoded in the sexpr **before** the model is invoked. The model only does the leaf work: draft this paragraph, classify this review, summarize these sources. The orchestration is the program, not the prompt.

```
CLAUDE.md interpretation                NeuroLisp deterministic orchestration
─────────────────────────                ────────────────────────────────────
Big model reads instructions             Small model receives a fully-formed
Big model decides next step              briefing for ONE leaf task
Big model loads context                  Workflow already loaded the context
Big model picks the tool                 Workflow already locked the tools
Big model improvises                     Sexpr executes deterministically
Cost scales with model + complexity      Cost scales with leaf-task count only
```

Practical effect: a 5-step essay pipeline that needs a top-tier model to coordinate via instruction-file interpretation runs on a smaller, cheaper model under NeuroLisp because the coordination is in the sexpr. Same output quality, often two orders of magnitude cheaper per LLM call:

```
DeepSeek v4-flash    $0.07 / 1M input tokens     $0.27 / 1M output tokens
Claude Opus 4.x      $15   / 1M input tokens     $75   / 1M output tokens
                     ~214× input                  ~278× output
```

Public list prices as of 2026-05. The 5-step essay benchmark in this repo lands at ~$0.003 per essay on DeepSeek v4-flash; the same workflow under a top-tier model would be in the $0.50 - $1.00 range.

This compounds. As you accumulate dozens of reusable pipelines and skills, you have a personal orchestration layer that any cheap model can drive. **The intelligence migrates from the model into your workflow library**, where it is inspectable, diff-able, and version-controlled.

---

## In 30 seconds

**Run a real workflow.** One macro expands into 5 LLM steps: planner → 2 parallel researchers → 2 writers → reflect-revise editor.

```lisp
(essay-atom-pipeline-scoped "GraphQL vs REST in 2026" "research-team")
;; → 4000+ word essay, ~$0.002-0.005 on DeepSeek v4-flash
```

**Inspect what ran.** Workflow + steps are sqlite rows, not opaque objects.

```bash
sqlite3 ~/.neurolisp_mcp.sqlite \
  "SELECT id, status, cursor, length(steps_sexpr) FROM workflow_runs ORDER BY id DESC LIMIT 1"
# wf-9b2e... | complete | 5/5 | 842 chars of plain sexpr
```

**Survive a crash.** Pull the plug at step 4 of 5. Reconnect Claude Code and ask it to:

```
nl_workflow_replay(workflow_id="wf-9b2e...", from_step="draft-section-2")
# resumes at draft-section-2; outline + research-1 + research-2 reuse
# cached results from sqlite, cost: $0 for the first 3 steps
```

**Hand-edit a step then replay.**

```
nl_workflow_patch_step(workflow_id="wf-9b2e...",
                       step_name="reflect-revise",
                       new_prompt="Reflect using 7 quality dimensions, then revise.")
nl_workflow_replay(workflow_id="wf-9b2e...", from_step="reflect-revise")
```

No framework rerun. No LangGraph rebuild. Patch the sexpr and go.

---

## How a complex workflow actually runs

Take the essay-atom-pipeline from the example above. It is intentionally complex enough to exercise every NeuroLisp mechanism: `:auto T` (server-side LLM), `:tools` deferred subagents (brain-side dispatch), a `parallel` group, a `:retry-validate` quality guard, a `:sink` length backstop, and `:scope` lexical-tool whitelisting through a NodeProfile. After the one-line macro call, here is what actually happens.

### One user request, five MCP boundaries

```
                         USER
                           │  "Write me an essay on GraphQL vs REST."
                           ▼
                  ┌──────────────────┐
                  │   Claude Code    │  main agent in user's terminal
                  │   (the brain)    │  decides to use NeuroLisp
                  └────────┬─────────┘
                           │
                           │  nl_eval_sexpr(
                           │    '(essay-atom-pipeline-scoped
                           │      "GraphQL vs REST in 2026"
                           │      "research-team")')
                           ▼
                  ┌──────────────────┐
                  │  NeuroLisp MCP   │  parses sexpr, expands macro,
                  │     server       │  walks workflow groups, persists
                  │   (Python)       │  state to ~/.neurolisp_mcp.sqlite
                  └─┬────────────┬───┘
                    │            │
       ┌────────────┘            └────────────┐
       │ for :auto T steps        for :tools  │
       │ server calls LLM         server      │
       │ provider directly        hands back  │
       ▼                          deferred    ▼
┌─────────────────┐               token   ┌────────────────┐
│ LLM provider    │                       │ Claude Code    │
│ DeepSeek /      │◀──────────────────────│ Agent tool     │
│ OpenAI-compat / │   HTTP request        │ dispatches a   │
│ Anthropic       │   includes briefing   │ fresh subagent │
│   (urllib only) │                       │ with the       │
└────────┬────────┘                       │ briefing kit   │
         │ text response                  └───────┬────────┘
         ▼                                        │ subagent
   apply :sink                                    │ runs WebSearch
   apply :retry-validate                          │ + WebFetch
   write to wf.results                            │ + reasoning
   write to corpus row in sqlite                  ▼
         │                                  result text
         │              ┌─────────────────────────┘
         │              │ nl_resolve_subagent(token, result)
         ▼              ▼
              workflow advances cursor
              next group fires
              auto-chain runs all :auto steps in one server call
                       │
                       ▼
              workflow status = complete
              final result in wf.results["reflect-revise"]
                       │
                       ▼
            ┌──────────────────┐
            │   USER reads     │
            │   the essay      │
            └──────────────────┘
```

### Phase-by-phase trace

| # | Phase | Who acts | What actually happens |
|---|---|---|---|
| 1 | Macro expansion | Server | `(essay-atom-pipeline-scoped ...)` → 30-line `(workflow (quote ...) (quote (5 steps)))` AST in memory. No LLM yet. |
| 2 | Group 0: outline | Server + DeepSeek | `:auto T` planner step. Server builds briefing from `essay-outline-architect` skill + topic, HTTP-POSTs DeepSeek, gets 400-word outline, applies sink (none), writes `wf.results["outline"]` + corpus row `auto-step:planner`. |
| 3 | Group 1: research × 2 (parallel + deferred) | Brain → 2 subagents | `:tools (WebSearch WebFetch)` steps emit 2 deferred tokens. Brain receives `parallel_steps` payload, dispatches 2 Claude Code subagents via the `Agent` tool, each with its own briefing kit (Role / Task / Upstream Artifacts / SOP / Tools Available). Subagents call WebSearch + WebFetch independently. Brain receives 2 result strings, calls `nl_resolve_subagent(token, result)` twice. |
| 4 | Auto chain: groups 2-4 | Server + DeepSeek | After the 2nd resolve, server sees the next 3 groups (draft-section-1, draft-section-2, reflect-revise) are all `:auto T`. It runs them back-to-back in a single server-side loop (`auto chain`, v7.62), no brain round-trips. Each step's prompt references upstream step names which the env resolves to actual text. |
| 5 | Final guard | Server | reflect-revise has `:sink (cond ((< (string-length result) 2000) (str "WARNING short essay..." result)) (T result))`. If LLM truncates, the sink prepends a WARNING header before storing. `wf.summaries["reflect-revise"]` also stored for downstream brevity. |
| 6 | Return | Server → Brain → User | Server returns `{complete: true, results: {6 step keys}}`. Claude Code reads `results["reflect-revise"]` and shows the essay to the user. |

### The boundary each layer enforces

- **The brain (Claude Code)** decides *what to ask for* and *whom to dispatch* (subagents). It never decides *how a step is written* — the workflow grammar already encodes that.
- **The server (NeuroLisp)** decides *how each step executes* (auto vs deferred), *what context that step sees* (briefing kit), and *what state persists* (sqlite). It never decides *whom to dispatch* — invariant 3.
- **The LLM provider** does *one leaf task* at a time. Briefing is fully assembled before the HTTP call, so the model is not asked to plan, only to produce.
- **The subagent** is a *one-shot* Claude Code instance: opens a fresh context, runs the tools listed in its briefing, returns one result string. It has no awareness of the broader workflow.

That separation is what lets a small, cheap leaf-model do the same end-to-end work that a single big-model session would otherwise need: every step is pre-decided in the sexpr, so the model is never asked to be smart about the plan.

### What sqlite actually contains after one run

```bash
sqlite3 ~/.neurolisp_mcp.sqlite "
  SELECT primitive, success, length(output) AS out_chars, cost
  FROM corpus
  ORDER BY row_id DESC LIMIT 6"
```

```
invoke-subagent:editor            | 1 | 4149 | 0.00097     -- reflect-revise (:auto T)
invoke-subagent:writer            | 1 | 2104 | 0.00031     -- draft-section-2 (:auto T)
invoke-subagent:writer            | 1 | 2087 | 0.00029     -- draft-section-1 (:auto T)
invoke-subagent:general-purpose   | 1 | 1856 | 0.0         -- research-2 (:tools, brain-side)
invoke-subagent:general-purpose   | 1 | 1734 | 0.0         -- research-1 (:tools, brain-side)
invoke-subagent:planner           | 1 |  412 | 0.00018     -- outline (:auto T)
```

Every step writes a row with primitive prefix `invoke-subagent:<agent>` regardless of execution path. `:auto T` rows carry the server-side LLM cost; `:tools` rows show `0.0` because the cost lives on the brain side (subagent dispatch). `:pure T` steps use a different prefix `pure-step:<agent>`.

```bash
sqlite3 ~/.neurolisp_mcp.sqlite "
  SELECT id, status, cursor, length(steps_sexpr) AS plan_chars
  FROM workflow_runs ORDER BY rowid DESC LIMIT 1"
```

```
wf-9b2e... | complete | 5/5 | 842
```

6 corpus rows are append-only audit trail; the `workflow_runs` row is the resumable snapshot. Both are plain SQL. Both are diff-able. Nothing about this run is opaque.

---

## Get started

Install from source (works today):

```bash
git clone https://github.com/KevinBangbang/NeuroLisp.git
cd NeuroLisp
pip install -e .
python -m neurolisp.health
```

A PyPI release (`pip install neurolisp`) is staged for v0.35.0 and will be available shortly after the first public release tag.

Wire into Claude Code by editing `~/.claude.json`:

```json
{
  "mcpServers": {
    "neurolisp": {
      "command": "python",
      "args": ["-m", "mcp_server.server"]
    }
  }
}
```

Restart Claude Code, run `/mcp`. All 45 `nl_*` tools appear.

For real LLM steps, export an API key:

```bash
export DEEPSEEK_API_KEY=sk-...
# or OPENAI_API_KEY / ANTHROPIC_API_KEY
```

OpenAI-compatible endpoints (Groq, Together, Cerebras, local vLLM, etc.) are supported by swapping `base_url`. See [`docs/00_quickstart.md`](docs/00_quickstart.md) for the 10-minute walkthrough.

---

## What you can do

- **Resume any workflow at the exact step** after a crash, kernel panic, or `kill -9`. Step results live in sqlite, not RAM.
- **Audit every subagent invocation by `cat`-ing a sexpr.** The execution plan is the same file you can hand-edit.
- **Skip steps you have already paid for** with `nl_workflow_replay(workflow_id, from_step="step-name")`. Cached upstream results stay valid.
- **Patch one step then replay.** `nl_workflow_patch_step` updates the plan; the next replay picks up the change.
- **Crystallize a repeated pattern into a named skill** after 3 consistent observations. Reuse it from any future workflow.
- **Skip the heavyweight 5-step pipeline for low-stakes tasks** with `stakes-route`. Empirically 89% cheaper than the full pipeline (v0.34 routine vs full bench).
- **Swap LLM providers without changing the workflow.** DeepSeek, Anthropic, and any OpenAI-compatible endpoint (Groq, Together, Cerebras, local vLLM) all on stdlib `urllib`, no SDK lock-in.
- **Trust the test surface.** 1992 passing tests + 4 skipped on every commit across Ubuntu and Windows on Python 3.10 / 3.11 / 3.12.

---

## Documentation

| If you want to... | Read this |
|---|---|
| Try it in 10 minutes | [`docs/00_quickstart.md`](docs/00_quickstart.md) |
| Understand why each layer exists | [`docs/01_concepts/00_first_principles.md`](docs/01_concepts/00_first_principles.md) |
| Follow a build-up tutorial | [`docs/07_tutorial-book/`](docs/07_tutorial-book/) 17 chapters |
| Browse all 45 MCP tools | [`docs/02_reference/mcp-tools.md`](docs/02_reference/mcp-tools.md) |
| Read the 8 invariants and anti-goals | [`NORTH_STAR.md`](NORTH_STAR.md) |
| See empirical benchmarks | [`docs/BENCHMARKS.md`](docs/BENCHMARKS.md) |
| Run example scripts locally | [`examples/`](examples/) 3 standalone demos |
| Browse version history | [`CHANGELOG.md`](CHANGELOG.md) |

---

## Community

- [GitHub Discussions](https://github.com/KevinBangbang/NeuroLisp/discussions) for design questions, show-and-tell, RFCs
- [GitHub Issues](https://github.com/KevinBangbang/NeuroLisp/issues) for bug reports and feature requests
- Security disclosures: [`SECURITY.md`](SECURITY.md)

This is a 1-maintainer project. Realistic response time: a few days for bug reports with reproducers, longer for open-ended discussions.

---

## Contributing

NeuroLisp is small by design. We review every PR with Occam's razor. New atoms, modules, or workflows must pass real-LLM end-to-end A/B validation before landing on main. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for setup, conventions, and the in-scope / out-of-scope list.

---

## License

[Apache 2.0](LICENSE). Copyright Bangcheng Wang and NeuroLisp contributors.
