Metadata-Version: 2.4
Name: simplicio-prompt
Version: 1.10.1
Summary: Tuple-Space + Yool safe-speed runtime kernel: lazy 1,000,000+ subagent batch_spawn, adaptive lane concurrency, and a dependency-free OpenAI-compatible provider client for real subagents on DeepSeek / MiMo / OpenRouter / local LLMs.
Author: Wesley Simplicio
License: MIT
Project-URL: Homepage, https://github.com/wesleysimplicio/simplicio-prompt
Project-URL: Repository, https://github.com/wesleysimplicio/simplicio-prompt
Project-URL: Issues, https://github.com/wesleysimplicio/simplicio-prompt/issues
Keywords: claude,subagents,tuple-space,yool,hamt,runtime,ai-agents,prompt,deepseek,mimo,openrouter,ollama,llm
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

<p align="center">
  <img src="docs/assets/simplicio-prompt-hero.jpg" alt="simplicio-prompt — AI with 1,000,000+ subagents / IA com 1.000.000+ subagentes" width="100%" />
</p>

# simplicio-prompt

> Capability-addressing pattern: **yool** (atomic action) wrapped in **tuples** (addressable envelopes) over an **HAMT** (Hash Array Mapped Trie) registry, coordinated through a **tuple-space** with **content-addressable receipts**.

This repo is the canonical spec. Vendor it into any project that wants the pattern.

## What's new in 1.5

More features, more reach, and a tighter runtime:

- **Now on PyPI** — `pip install simplicio-prompt` ships the dependency-free
  Python kernel and a `simplicio-subagents` console command, alongside the
  existing npm package.
- **Real subagents on any model / provider** — DeepSeek, MiMo, OpenRouter, or a
  free local LLM (Ollama / vLLM / LM Studio) all run through the same
  `--provider` / `--model` flags. Default fan-out is now **200** subagents;
  `--subagents 600` is opt-in for max-breadth parallel audits.
- **First-class observability** — `PromptFanout` adapters plus per-lane
  token/cost accounting, circuit-breaker state, and cache stats in every
  snapshot.
- **Always-on invocation** — a `UserPromptSubmit` hook routes every message
  through the runtime with no trigger keyword, now including a Gemini CLI target.
- **One-command test suite** — `npm test` runs both the Node CLI/hook tests and
  the full Python kernel suite (44 tests) in a single pass.

## Scope — where simplicio-prompt actually wins

`simplicio-prompt` is a **runtime for fan-out**, not a wrapper that makes
single LLM calls more accurate. Be honest about the evidence before picking
this up:

- **Single-call code generation** — the ONE-SHOT template adds runtime
  framing on top of a normal prompt. Across 4 models × 12 cases × 3 sides
  the bench shows **parity or marginal regression** vs sending the task
  through [`simplicio-cli`](https://github.com/wesleysimplicio/simplicio-cli)
  directly. If your use case is "one model, one prompt, one artifact",
  use cli alone — sp adds overhead with no measurable lift.
- **Fan-out with consensus aggregation** — this is where sp earns its
  keep. On the same bench, `cli + sp` running **N=200 subagents with
  modal-vote** on `qwen-coder-next-instruct` scored **12/12 PHPUnit
  green** — the only configuration that did so. Modal-vote on smaller
  MoE models can collapse to uniq=1 when prompts are bit-identical;
  that's the bottleneck the v1.11 diverse-prompt and v1.12 behavior-
  consensus phases attack.

The two products are composable, not alternatives:

| Layer | Project | Answer |
|---|---|---|
| WHAT the task is | [`simplicio-cli`](https://github.com/wesleysimplicio/simplicio-cli) | role / stack / goal / target / testable criteria / constraints / output shape (6-layer contract) |
| HOW the agent operates | **`simplicio-prompt`** (this repo) | orchestration, fan-out, real subagents, safe-speed runtime |

For code tasks, send the contract through cli and let sp handle fan-out
when N>1 makes sense.

### Two physically separate templates

So the BATCH content never enters the model's context in single-artifact
tasks:

| Template | File | When |
|---|---|---|
| **ONE-SHOT** (default) | [`prompts/agent-runtime-execution-prompt.md`](prompts/agent-runtime-execution-prompt.md) | Single deliverable per request: file edit, diff, snippet, answer, classification. Lean persona, no tuple-space framing in the body. |
| **BATCH** (opt-in) | [`prompts/agent-runtime-batch.md`](prompts/agent-runtime-batch.md) | Orchestrated fan-out: parallel audits, item-parallel work, 200+ real subagents, brainstorm at scale. Tuple-space primitives, `batch_spawn`, `LaneWorkerPool`, real subagent invocation via `simplicio-subagents`. |

Opt-in paths to BATCH:

- Set `YOOL_TUPLE_FULL_RUNTIME=1` — the Claude Code hook switches to the
  BATCH directive.
- Run `npx simplicio-prompt --batch` / `--batch --raw` to print the BATCH
  template.
- Import `getBatchPrompt()` / `getBatchPromptSection()` from the npm API,
  or use the `simplicio-prompt/batch-prompt` package export.
- Have your coding agent invoke `simplicio-subagents` via shell when the
  task is genuinely fan-out (the BATCH template includes this protocol).

## Highlight: 1,000,000+ subagents, zero enumeration

> **`simplicio-prompt` scales to 1,000,000+ subagents in a single task without
> enumerating them, without spawning a million processes, and without melting
> your provider quota.**

It does this with `batch_spawn(depth, branching, compression_threshold)` — a
lazy hierarchical fan-out over a Hilbert-indexed tuple graph. The kernel stores
**virtual-agent counts and content-addressable receipts** instead of a flat
list of agents, so the cost of representing the work is logarithmic, not
linear.

- **`depth=4, branching=32` ⇒ 1,048,576 subagents** materialized only when a
  tuple is actually visited.
- `2,833.75x` faster scale representation than a flat instruction flow
  (V2 benchmark).
- `26.93x` faster active execution than naive sequential fan-out.
- `compress_token` + `prune_idle` keep inactive subagents as auditable tokens,
  so a million-subagent task still fits in a small working set.
- `LaneWorkerPool` enforces bounded per-lane concurrency
  (`YOOL_TUPLE_LANE_CONCURRENCY=32`, `YOOL_TUPLE_MAX_LANE_CONCURRENCY=64`),
  so a million-subagent graph never turns into a million concurrent calls.
- Provider safety stays intact: receipt/input cache, jittered backoff,
  circuit breakers, and small-task batching apply at the million-subagent
  scale exactly as they do at one.
- Observability is first-class: snapshots include per-lane token/cost usage,
  circuit breaker state, cache stats, and lane worker success/failure metrics.

The output shape stays auditable at any scale:

```text
[Tuple Space Snapshot]
[Active Agents/Subagents]      ← materialized, small
[Total Agents/Subagents]       ← virtual, up to 1,000,000+
[Proximo Yool a executar]
[Resultado parcial]
```

See [`prompts/agent-runtime-execution-prompt.md`](prompts/agent-runtime-execution-prompt.md)
and [`kernel/yool_tuple_kernel.py`](kernel/yool_tuple_kernel.py) for the
canonical `batch_spawn` contract.

## Real subagents — any model, any provider

> **Point `simplicio-prompt` at any OpenAI-compatible provider and one task fans
> out across 200 real subagents by default** — actual LLM calls, not placeholders,
> through the same safe-speed runtime. **The provider doesn't matter**: DeepSeek,
> MiMo, OpenRouter (any model it gateways), a free local LLM (Ollama / vLLM /
> LM Studio), or any custom endpoint all work through the same `--provider` /
> `--model` flags.
>
> Drop to a smaller `--subagents` count for tight budgets; opt in to
> `--subagents 600` (or higher) for max-breadth: large parallel audits,
> item-parallel work where N items ≥ 200, or running the live-bench scenario
> below.
>
> Verified live on OpenRouter (`deepseek/deepseek-chat`) with the max-breadth
> setting: **600/600 subagents, 0 failures, ~103s, ≈$0.045** at illustrative
> pricing on a single task. Change `--subagents` to run any count.

`batch_spawn` represents subagents *virtually*. To make them **real**, the kernel
ships a dependency-free, OpenAI-compatible provider client and a fan-out runtime
that drains N materialized subagent tuples through `LaneWorkerPool` — so every
real call inherits the same guardrails (bounded per-lane concurrency, the
receipt/input cache that de-duplicates identical prompts, jittered backoff, and a
provider circuit breaker).

- [`kernel/providers.py`](kernel/providers.py) — `LLMProvider` + presets for
  `deepseek`, `mimo`, `local`/`ollama`, and a generic OpenAI-compatible config,
  with per-call token usage and **cost accounting**.
- [`kernel/subagent_runtime.py`](kernel/subagent_runtime.py) — `SubagentRuntime`
  fans a task across N real subagents and aggregates results, tokens, and cost.

Run **real** subagents on DeepSeek. Default fan-out is **200**; opt in to
`--subagents 600` for max-breadth runs.

```bash
export DEEPSEEK_API_KEY=sk-...

# default 200 subagents
python kernel/subagent_runtime.py --provider deepseek \
  --task "Brainstorm edge cases and tests for a distributed rate limiter"

# opt-in: max-breadth fan-out for large parallel audits
python kernel/subagent_runtime.py --provider deepseek --subagents 600 \
  --task "Audit these 600 functions in parallel"

# offline cost projection / demo — no key, no network:
python kernel/subagent_runtime.py --provider deepseek --subagents 600 \
  --task "..." --dry-run

# other providers (default 200 subagents)
python kernel/subagent_runtime.py --provider mimo  --task "..."
python kernel/subagent_runtime.py --provider local --subagents 50 --task "..."   # Ollama
```

Or programmatically:

```python
from kernel.providers import resolve_provider_config, LLMProvider
from kernel.subagent_runtime import SubagentRuntime

provider = LLMProvider(resolve_provider_config("deepseek"))
# default fan-out is 200; bump to 600 for max-breadth parallel audits
report = SubagentRuntime(provider).run("Audit this module", subagents=200)
print(report.format_summary())   # completed/failed, tokens, total cost in USD
```

### Provider configuration

| Preset | Default base URL | API key env | Notes |
|---|---|---|---|
| `deepseek` | `https://api.deepseek.com/v1` | `DEEPSEEK_API_KEY` | cheap cloud, OpenAI-compatible |
| `mimo` | `https://api.mimo.ai/v1` | `MIMO_API_KEY` | set `MIMO_BASE_URL` to your endpoint |
| `openrouter` | `https://openrouter.ai/api/v1` | `OPENROUTER_API_KEY` | OpenAI-compatible gateway; set `--model` (e.g. `deepseek/deepseek-chat`) |
| `local` / `ollama` | `http://localhost:11434/v1` | none | free/offline (Ollama, vLLM, LM Studio) |
| *(any other id)* | `SIMPLICIO_LLM_BASE_URL` | `SIMPLICIO_LLM_API_KEY` | generic OpenAI-compatible |

Per-provider env overrides (example for DeepSeek): `DEEPSEEK_BASE_URL`,
`DEEPSEEK_MODEL`, `DEEPSEEK_PROMPT_COST_PER_MTOK`,
`DEEPSEEK_COMPLETION_COST_PER_MTOK`. **Cost defaults are illustrative** — set the
`*_COST_PER_MTOK` vars to your live contract pricing (e.g. `0.003`) so the
reported `cost_usd` matches your bill.

## Acknowledgement

Special thanks to [Jesse Daniel Brown, PhD](https://github.com/JesseBrown1980), my mentor, a California, USA native and author of 100+ scientific articles. His humanitarian and educational perspective on programming, AI, and scientific work helped reinforce the mission behind this repository: practical agent systems that increase human capability through safer, more auditable automation.

## V2 safe-speed infographics

### English
![YOOL V2 Safe-Speed Runtime infographic in English](docs/assets/yool-v2-safe-speed-infographic-en.png)

### Portuguese Brazil
![YOOL V2 Safe-Speed Runtime infographic in Portuguese Brazil](docs/assets/yool-v2-safe-speed-infographic-pt.png)

### Infographic Explanation

The infographics compare a loose prompt flow against the `simplicio-prompt` V2
safe-speed runtime. The left side shows the old failure modes: flat agent lists,
sequential work, repeated provider calls, no cache, fixed concurrency, retry
storms, large LLM context, and weak audit trails.

The right side shows the V2 path: tuple-space routing, lazy `batch_spawn`,
adaptive `LaneWorkerPool`, receipt/input cache, small-task batching, circuit
breakers, backoff with jitter, context compression, local yool routing, and
speculative execution only for idempotent work. The practical result is faster
delivery through avoided repeat work and safer provider behavior, not through
unbounded calls.

Measured V2 benchmark highlights:

- Scale representation: `2,833.75x` faster than a normal instruction flow.
- Active execution: `26.93x` faster than normal sequential execution.
- Cache: `4x` fewer provider calls, a `75%` reduction.
- Batching: `32x` fewer small-task calls, a `96.88%` reduction.
- Circuit breaker: `64x` fewer failure attempts, a `98.44%` reduction.
- Token economy: `76.32%` estimated savings through context compression.

## Quick read

- `YOOL_TUPLE_HAMT.md` - full spec with diagrams, algorithms, examples, guardrails.
- `kernel/yool_tuple_kernel.py` - reference Python kernel with lazy `batch_spawn`,
  `compress_token`, hookwall, indexed tuple-space scans, and lane worker fan-out.
- `prompts/agent-runtime-execution-prompt.md` - ready prompt for Claude, Codex,
  Hermes, and other coding agents.
- `examples/` - runnable minimal implementations (Python, Node).
- `guardrails/` - CPU throttle + disk GC reference implementations.
- `adopters.md` - projects that vendor this spec.

## Install via npm

The repo ships as an npm package and as a **multi-IDE plugin**. Use it without
cloning:

```bash
# print the full prompt
npx simplicio-prompt

# print only the `## Prompt` body (no surrounding markdown)
npx simplicio-prompt --raw

# install as a plugin for one (or many) coding agents
npx simplicio-prompt --target claude-code     # → CLAUDE.md
npx simplicio-prompt --target codex           # → AGENTS.md
npx simplicio-prompt --target hermes          # → AGENTS.md
npx simplicio-prompt --target opencode        # → AGENTS.md  (alias: openclaw)
npx simplicio-prompt --target cursor          # → .cursor/rules/*.mdc + .cursorrules
npx simplicio-prompt --target copilot         # → .github/copilot-instructions.md
npx simplicio-prompt --target cline           # → .clinerules/simplicio-prompt.md
npx simplicio-prompt --target aider           # → CONVENTIONS.md
npx simplicio-prompt --target gemini          # → GEMINI.md
npx simplicio-prompt --install-all            # → every target above

# inspect / discover
npx simplicio-prompt --list-targets
```

Or add it as a dependency and consume it programmatically:

```bash
npm install simplicio-prompt
```

```js
import {
  getPrompt,
  getPromptSection,
  getPromptPath,
  getTargets,
  findTarget,
} from "simplicio-prompt";

const fullMarkdown = getPrompt();        // entire prompt file
const promptOnly   = getPromptSection(); // just the `## Prompt` body
const filePath     = getPromptPath();    // absolute path on disk
const targets      = getTargets();       // multi-IDE plugin target registry
const cursor       = findTarget("cursor");
```

Every install wraps the prompt in `<!-- simplicio-prompt:start -->` /
`<!-- simplicio-prompt:end -->` markers so reinstalling updates the block in
place instead of duplicating it. The Cursor `.mdc` rule and any new directory
(`.cursor/rules/`, `.github/`, `.clinerules/`) are created automatically.

## Install via PyPI (Python)

The reference kernel ships as a **dependency-free** Python package, so you can
run real subagents on any OpenAI-compatible provider without cloning the repo:

```bash
pip install simplicio-prompt
```

```python
from kernel.providers import resolve_provider_config, LLMProvider
from kernel.subagent_runtime import SubagentRuntime
from kernel.yool_tuple_kernel import build_default_space

# 1,000,000+ subagents represented lazily, materialized only when visited
space, root = build_default_space()
receipt = space.batch_spawn(root, "codex_worker", depth=4, branching=32)
print(receipt.virtual_agents)  # 1048576

# real subagents on any provider (DeepSeek / MiMo / OpenRouter / local)
provider = LLMProvider(resolve_provider_config("deepseek"))
# default fan-out is 200; bump to 600 for max-breadth parallel audits
report = SubagentRuntime(provider).run("audit this module", subagents=200)
print(report.format_summary())  # completed/failed, tokens, total cost in USD
```

The install also exposes a console entry point for offline cost projection and
live fan-out:

```bash
# default fan-out (200 subagents) — offline cost projection, no API key
simplicio-subagents --provider deepseek --task "..." --dry-run

# max-breadth fan-out (opt-in) for large parallel audits
simplicio-subagents --provider deepseek --subagents 600 --task "..." --dry-run
```

### Optional Rust acceleration (PyO3)

For hot paths (prompt section extraction, multi-needle pattern scan, and
guardrail validation) the repo ships an **optional** native extension
written in Rust with PyO3 bindings. The default `pip install simplicio-prompt`
remains pure-Python and dependency-free; the native extension is opt-in.

Build and install the extension locally:

```bash
pip install maturin
cd rust && maturin develop --release
```

That produces `_simplicio_native` in the current Python environment.
`kernel/utils/accel.py` transparently picks it up on the next import:

```python
from kernel.utils import accel

accel.backend_name()                              # "native" (or "python" w/o the wheel)
accel.extract_prompt_section(text)                # native if available, identical output
accel.find_patterns(text, ["needle-a", "needle-b"])
accel.validate_guardrails(["rule-a", "rule-b"], text)
```

The native path is parity-tested against the pure-Python fallback
(`tests/test_accel.py`) so the output is identical byte-for-byte; pick
whichever your install has. Measured speedups on local builds:

```
extract / ONESHOT (~75 lines)             ~3.0x
extract / BATCH   (~280 lines)            ~3.1x
extract / 5KB body                        ~7.0x
find_patterns / 4 needles in 5KB          ~2.2x
validate_guardrails / 6 rules in 5KB      ~7.4x
```

(50k iterations each, Rust 1.94, CPython 3.11.) Run
`python benchmarks/accel_benchmark.py` to reproduce locally.

The native wheel is **not** published to PyPI yet — that requires a
manylinux/macOS/Windows wheel matrix in CI. For now `cd rust && maturin
develop` is the canonical install path.

### Multi-IDE plugin matrix

| Target id | IDE / agent | Files written |
|---|---|---|
| `claude-code` | Anthropic Claude Code | `CLAUDE.md` |
| `codex` | OpenAI Codex CLI | `AGENTS.md` |
| `hermes` | Nous Research Hermes | `AGENTS.md` |
| `opencode` (alias `openclaw`) | OpenCode / OpenClaw | `AGENTS.md` |
| `cursor` | Cursor IDE | `.cursor/rules/simplicio-prompt.mdc`, `.cursorrules` |
| `copilot` (alias `github-copilot`) | GitHub Copilot | `.github/copilot-instructions.md` |
| `cline` | Cline (VS Code) | `.clinerules/simplicio-prompt.md` |
| `aider` | Aider | `CONVENTIONS.md` |
| `gemini` (alias `gemini-cli`) | Google Gemini CLI | `GEMINI.md` |

### Auto-invocation per agent

The runtime is meant to be **always-on** — every user message is treated as the
task and routed through the runtime, with no trigger keyword. How that
invocation is wired depends on the agent:

| Agent | Invocation mechanism | Always-on? |
|---|---|---|
| **Claude Code** | `UserPromptSubmit` **hook** (`plugins/claude-code/hooks/`) injects the runtime contract on every prompt; plus a `simplicio-runtime` **skill** and slash commands | yes, programmatic |
| Codex / Hermes / OpenCode | `AGENTS.md` loaded as standing instructions | yes (context file) |
| Cursor | `.cursor/rules/*.mdc` with `alwaysApply: true` | yes (always-apply rule) |
| GitHub Copilot | `.github/copilot-instructions.md` | yes (context file) |
| Cline | `.clinerules/simplicio-prompt.md` | yes (context file) |
| Aider | `CONVENTIONS.md` | yes (context file) |
| Gemini CLI | `GEMINI.md` loaded every turn | yes (context file) |

A true prompt-submit *hook* (inject-on-every-prompt with programmatic
stand-down) exists only in Claude Code. For the other agents, the always-applied
context file **is** the invocation — it is read on every turn, so the contract is
always present. All of them honor in-message stand-down phrases ("stop",
"cancel", "exit runtime", "ignore simplicio").

### Claude Code plugin

`plugins/claude-code/` is a full Claude Code plugin: drop it into
`~/.claude/plugins/simplicio-prompt/` (or your project's `.claude/plugins/`) to
get an always-on hook, three slash commands, and a runtime skill:

- **`UserPromptSubmit` hook** (`hooks/hooks.json` → `hooks/user-prompt-submit.mjs`)
  — injects the runtime contract (`hooks/runtime-directive.md`) on every prompt,
  unless the prompt is an explicit stand-down. This is the always-on invocation.
- `/simplicio <task>` — run the next task through the Tuple-Space + Yool runtime.
- `/simplicio-install <target>` — install the runtime contract into the
  current repo (`claude-code`, `codex`, `cursor`, `copilot`, `gemini`, `all`, …).
- `/simplicio-status on|off|<field>:on` — toggle the opt-in status output.
- `simplicio-runtime` skill — auto-activates in repos that vendor the spec.

### Legacy `--install`

The original single-file installer still works for backwards compatibility:

```bash
npx simplicio-prompt --install CLAUDE.md
npx simplicio-prompt --install AGENTS.md
npx simplicio-prompt --install .cursorrules
```

## Integration adapter

For orchestration repos that want the runtime without copying tuple boilerplate,
use the Python adapter:

```python
from examples.python.prompt_fanout import PromptFanout

fanout = PromptFanout(repo="my-service", authority="simplicio-sprint")
root, receipt = fanout.spawn_task(
    "review checkout edge cases",
    mapper_context={"target": "src/checkout.ts"},
    depth=2,
    branching=8,
)
fanout.record_tokens("analysis", prompt_tokens=1200, completion_tokens=300, cost_usd=0.02)
print(receipt.virtual_agents)
print(fanout.snapshot()["token_usage"])
```

`simplicio-dev-cli` can use the same adapter for internal verification
reasoning when it already has structured mapper context.

## How to use the prompt

Use `simplicio-prompt` as a canonical execution prompt for coding agents such as
Claude, Codex, Hermes, Cursor, Cline, or any assistant that can read repository
instructions.

1. Run `npx simplicio-prompt --install CLAUDE.md` (or paste the `## Prompt`
   section from [`prompts/agent-runtime-execution-prompt.md`](prompts/agent-runtime-execution-prompt.md)
   into `AGENTS.md`, `CLAUDE.md`, `.cursorrules`, or a custom system prompt).
2. In the target repository, just ask for work in your own words. **You do not
   need to start the message with `Implement`** — any user input (a sentence, a
   bug description, a code snippet, a one-word request) is treated as the task
   `X` and routed through the same runtime. The only opt-outs are explicit
   stand-down phrases like "stop", "cancel", "exit runtime".
3. The agent will read the canonical files listed in the prompt, decompose the
   task into a Hilbert-indexed tuple graph, create a root tuple, route active
   work through tuple-space primitives, and use `LaneWorkerPool` plus the V2
   safe-speed controls.
4. Status output is **opt-in** (default: silent). Enable with
   `YOOL_TUPLE_STATUS=true` (or `status_output=true` runtime flag). When on,
   the agent returns this shape:

```text
[Tuple Space Snapshot]
[Active Agents/Subagents]
[Total Agents/Subagents]
[Next Yool to Execute]
[Partial Result]
```

   Per-field toggles (default `false`): `YOOL_TUPLE_STATUS_SNAPSHOT`,
   `YOOL_TUPLE_STATUS_ACTIVE`, `YOOL_TUPLE_STATUS_TOTAL`,
   `YOOL_TUPLE_STATUS_NEXT`, `YOOL_TUPLE_STATUS_PARTIAL`.

For high-throughput local runs, set the runtime environment variables before
starting the agent or scripts:

```powershell
$env:YOOL_TUPLE_LANE_CONCURRENCY="32"
$env:YOOL_TUPLE_MAX_LANE_CONCURRENCY="64"
$env:YOOL_TUPLE_CPU_QUOTA_PCT="95"
$env:YOOL_TUPLE_QUEUE_MAXSIZE="8192"
$env:YOOL_TUPLE_COMPRESSION_THRESHOLD="1024"
$env:YOOL_TUPLE_CACHE_MAX_ENTRIES="16384"
$env:YOOL_TUPLE_CACHE_TTL_S="3600"
$env:YOOL_TUPLE_API_MAX_RETRIES="3"
$env:YOOL_TUPLE_API_BACKOFF_BASE_MS="100"
$env:YOOL_TUPLE_API_BACKOFF_MAX_MS="5000"
$env:YOOL_TUPLE_CIRCUIT_FAILURE_THRESHOLD="5"
$env:YOOL_TUPLE_CIRCUIT_COOLDOWN_S="30"
$env:YOOL_TUPLE_BATCH_SMALL_TASK_SIZE="32"
$env:YOOL_TUPLE_CONTEXT_COMPRESSION_CHARS="6000"
```

Run the reference kernel and tests:

```bash
python kernel/yool_tuple_kernel.py
python -m unittest discover -s tests -p "test_*.py"
```

## V2 benchmark report

The V2 report is the main evidence for the safe-speed runtime. Read it before
adopting the prompt in another project:

- [V2 Markdown report](benchmarks/v2_safe_speed_results.md)
- [V2 PDF report](benchmarks/v2_safe_speed_benchmark.pdf)
- [V2 benchmark script](benchmarks/v2_safe_speed_benchmark.py)
- [V2 PDF generator](benchmarks/generate_v2_benchmark_pdf.py)

What the V2 report shows:

- `2,833.75x` faster scale representation than normal instruction flow.
- `26.93x` faster active execution than normal sequential execution.
- `4x` fewer repeated provider calls through receipt/input cache.
- `32x` fewer small-task calls through batching.
- `64x` fewer provider failure attempts through circuit breakers.
- `76.32%` estimated token savings through context compression.

The key point: V2 speeds up by avoiding repeated work and controlling provider
pressure. It does not depend on unsafe infinite calls, unbounded concurrency, or
retry storms.

## High-throughput runtime defaults

The reference kernel is tuned for speed while keeping host guardrails explicit:

| Env var | Default | Purpose |
|---|---:|---|
| `YOOL_TUPLE_LANE_CONCURRENCY` / `YOOL_LANE_CONCURRENCY` | `32` | Preferred workers per lane. |
| `YOOL_TUPLE_MAX_LANE_CONCURRENCY` / `YOOL_MAX_LANE_CONCURRENCY` | `64` | Ceiling for workers per lane. |
| `YOOL_TUPLE_CPU_QUOTA_PCT` / `YOOL_CPU_QUOTA_PCT` | `95` | Default per-yool CPU budget. |
| `YOOL_TUPLE_QUEUE_MAXSIZE` / `YOOL_QUEUE_MAXSIZE` | `8192` | Lane queue scan cap. |
| `YOOL_TUPLE_COMPRESSION_THRESHOLD` / `YOOL_COMPRESSION_THRESHOLD` | `1024` | Active materialized agents before pruning. |
| `YOOL_TUPLE_CACHE_MAX_ENTRIES` / `YOOL_CACHE_MAX_ENTRIES` | `16384` | Receipt/input-hash cache size. |
| `YOOL_TUPLE_CACHE_TTL_S` / `YOOL_CACHE_TTL_S` | `3600` | Cache TTL in seconds. |
| `YOOL_TUPLE_API_MAX_RETRIES` / `YOOL_API_MAX_RETRIES` | `3` | Retry budget for transient API/LLM failures. |
| `YOOL_TUPLE_API_BACKOFF_BASE_MS` / `YOOL_API_BACKOFF_BASE_MS` | `100` | Initial jittered backoff delay. |
| `YOOL_TUPLE_API_BACKOFF_MAX_MS` / `YOOL_API_BACKOFF_MAX_MS` | `5000` | Backoff ceiling. |
| `YOOL_TUPLE_CIRCUIT_FAILURE_THRESHOLD` / `YOOL_CIRCUIT_FAILURE_THRESHOLD` | `5` | Failures before opening provider breaker. |
| `YOOL_TUPLE_CIRCUIT_COOLDOWN_S` / `YOOL_CIRCUIT_COOLDOWN_S` | `30` | Provider cooldown after breaker opens. |
| `YOOL_TUPLE_BATCH_SMALL_TASK_SIZE` / `YOOL_BATCH_SMALL_TASK_SIZE` | `32` | Default small-task batch size. |
| `YOOL_TUPLE_CONTEXT_COMPRESSION_CHARS` / `YOOL_CONTEXT_COMPRESSION_CHARS` | `6000` | Large LLM context compression threshold. |

Safe speedups now live in the kernel, not only in the prompt: receipt/input
cache, adaptive lane concurrency, jittered backoff, provider circuit breakers,
small-task batching, prompt/context compression, local yool routing, and
speculative execution only for tuples marked `idempotent=True`.

Run the reference kernel and tests:

```bash
python kernel/yool_tuple_kernel.py
python -m unittest discover -s tests -p "test_*.py"
```

Benchmark reports:

- [Prompt vs normal Markdown](benchmarks/prompt_vs_normal_results.md)
- [Prompt vs normal PDF](benchmarks/prompt_vs_normal_benchmark.pdf)
- [V2 safe-speed Markdown](benchmarks/v2_safe_speed_results.md)
- [V2 safe-speed PDF](benchmarks/v2_safe_speed_benchmark.pdf)

## Why a separate repo

The pattern is cross-project. SendSprint, llm-project-mapper, future agents - all consume the same spec. One source of truth, vendored on demand.

## License

MIT © Wesley Simplicio. See [`LICENSE`](LICENSE).
