Metadata-Version: 2.4
Name: rtxclaw
Version: 0.3.0
Summary: Sovereign-inference TUI chat against an OpenAI-compatible vLLM endpoint.
Author: rtxclaw.ai
License-Expression: MIT
Keywords: llm,tui,vllm,agent,acp,sovereign-inference
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: textual>=0.85
Requires-Dist: httpx>=0.27
Requires-Dist: aiohttp>=3.9
Requires-Dist: rich>=13.0
Requires-Dist: trafilatura>=2.0
Requires-Dist: openai>=1.0
Requires-Dist: sqlite-vec>=0.1.6
Requires-Dist: mcp>=1.0
Requires-Dist: markitdown[pdf]>=0.0.1a3
Requires-Dist: markitdown-mcp>=0.0.1a3
Requires-Dist: agent-client-protocol>=0.9
Requires-Dist: hypercorn>=0.17
Requires-Dist: starlette>=0.40
Requires-Dist: sse-starlette>=2.1
Requires-Dist: imageio-ffmpeg>=0.5.1
Requires-Dist: yt-dlp>=2024.10.7
Requires-Dist: youtube-transcript-api>=1.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-xdist>=3.5; extra == "dev"
Requires-Dist: pytest-timeout>=2.3; extra == "dev"
Requires-Dist: import-linter>=2.0; extra == "dev"
Requires-Dist: jsonschema>=4; extra == "dev"
Requires-Dist: telethon>=1.36; extra == "dev"
Requires-Dist: gTTS>=2.4; extra == "dev"
Dynamic: license-file

# rtxclaw.ai

> **Run it now:** `git clone … && cd rtxclaw && ./rtxclaw` —
> the launcher creates `.venv/`, installs dependencies, and starts
> the TUI. See **[Getting started](docs/getting-started.md)** for
> first-time configuration, the workspace layout, env knobs, and the
> `RTXCLAW_HOME=~/.rtxclaw-test` test-sandbox pattern.

## Install

```bash
pip install rtxclaw
```

That puts a `rtxclaw` command on your `PATH`. Run it once with no
arguments — the first-run wizard walks you through the LLM endpoint,
model, and (optionally) Telegram, then brings the gateway up itself.
Three steps and you're in the TUI:

> **`pip install rtxclaw` → `rtxclaw` → answer the wizard → done.**

`pyproject.toml` declares the `rtxclaw_*` packages, the `rtxclaw`
console script, and the full runtime dependency set; cutting a release
is a `python -m build && twine upload dist/*` away.

### From a clone (for development)

`git clone … && cd rtxclaw && ./rtxclaw` — the launcher creates
`.venv/`, runs an editable `pip install -e .`, and starts the TUI.

rtxclaw.ai is the cypherpunk version of inference.

It exists to empower the user to have total control over their data and ideas, without the hassle of endless configuration or requiring deep open-source model knowledge just to get useful work done.

This project starts from a hard truth: the AI industry is underinvesting in infrastructure and degrading model quality to keep up with demand. Decisions like blocking OpenClaw from the Max plan and forcing heavy API costs on users reinforce the view that AI companies are sucking people and companies’ ideas like the Matrix uses human crops for creativity and for studying human thought processes.

That is the opposite of sovereignty.

The current model asks users and companies to pour their private context, internal reasoning, product ideas, and operational intelligence into centralized AI systems they do not control. In return, they get rising costs, shrinking access, degraded quality under load, and dependence on infrastructure decisions made by someone else.

And the risk is not theoretical.

The imminent Taiwan conflict will create heavy shocks to current business. Any company relying on AI will have no option but to pay the price of neoclouds if supply chains seize up and centralized inference tightens further. Businesses that chose not to build on-prem infrastructure, or at least retain the option, will be trapped into paying whatever the market demands.

rtxclaw is the answer to that trap.

rtxclaw is a custom-built agent system that adapts to the available inference capacity by creating tailored agents for each hardware profile, from a modest RTX 3060 to RTX 3090, RTX 4090, RTX 5090, A6000-class workstations, and up to advanced rented neocloud GPUs on platforms like Vast.ai.

Instead of forcing every task through one oversized, expensive, centralized stack, rtxclaw rightsizes inference to the real job:

- small agents on cheap local hardware
- stronger agents on workstations
- burst agents on rented neocloud GPUs
- flexible routing based on actual available capacity
- model selection based on task value, latency, and hardware envelope
- agent behavior shaped around the realities of the machine it runs on

Agents can be spawned in seconds using the right-sizing capacity for each task, reducing the cost of AI while increasing resilience, performance, and control.

## Why rtxclaw exists

Most AI products are built around a hidden assumption: the user should adapt to the vendor.

The vendor chooses the models.
The vendor chooses the pricing.
The vendor chooses when quality gets degraded.
The vendor chooses which products get blocked.
The vendor chooses which workloads are too expensive.
The vendor chooses whether your use case is welcome.

rtxclaw rejects that model.

The intelligence layer of a company is too important to outsource blindly. Your prompts are not just prompts. They are product direction, customer knowledge, internal process, strategy, failure modes, experimentation, and judgment in raw form. If your AI stack is not sovereign, your cognition stack is not sovereign.

## Core principles

- **Own the data**
- **Own the ideas**
- **Own the inference path**
- **Minimize configuration**
- **Avoid vendor lock-in**
- **Use the smallest capable model**
- **Adapt to available hardware**
- **Keep the system understandable**
- **Prefer tailored agents over monolithic bloat**
- **Treat inference as infrastructure, not magic**
- **Cancel scopes you can reason about** — abort one session and only that session (plus its subagents) dies; abort the gateway and everything dies. No mystery middle ground.

## Design philosophy

rtxclaw follows a simple philosophy: small enough to understand, flexible enough to adapt, powerful enough to matter.

Customization should come from code and agent behavior, not from endless configuration sprawl. The system should adapt to the user, the hardware, and the workload, not force the user to adapt to the limitations of a vendor’s pricing model or infrastructure bottlenecks.

This means:

- no blind dependence on one model provider
- no assumption that every task deserves frontier-model pricing
- no assumption that cloud is always the answer
- no assumption that local hardware is too weak to matter
- no assumption that one agent shape fits every machine

## Hardware-aware agents

rtxclaw is built around the idea that different hardware should produce different agent strategies.

A small local card like an RTX 3060 should not be treated the same way as a 4090, a 5090, an RTX A6000, or a high-memory neocloud GPU. The system should understand the available VRAM, throughput, latency, and cost envelope, then spawn the right kind of agent for the job.

Examples:

- **RTX 3060 / 4060-class**: lightweight routing, summarization, background memory work, small local copilots
- **RTX 3090 / 4090-class**: stronger coding agents, research agents, orchestration, hybrid local inference
- **RTX 5090-class**: high-end desktop inference, multi-agent local workflows, stronger reasoning on-prem
- **A6000 / workstation-class**: larger-context agents, heavier pipelines, persistent business-critical agent roles
- **Vast.ai / neocloud GPUs**: burst capacity, specialized heavy jobs, temporary swarms, overflow compute

The point is not to chase the biggest GPU.
The point is to make every GPU useful.

## What rtxclaw does

rtxclaw creates a custom-built agent system that:

- detects or knows the available inference capacity
- matches tasks to the right runtime and hardware tier
- spawns agents in seconds
- routes work based on cost, latency, and model capability
- reduces unnecessary API dependence
- uses neocloud only when it makes economic or operational sense
- preserves the option of on-prem inference as a first-class path
- keeps the architecture understandable enough to modify

## What makes it cypherpunk

Cypherpunk systems assume the network is hostile, dependency is dangerous, and convenience without control becomes a trap.

rtxclaw applies that logic to inference.

- If your intelligence depends entirely on remote providers, you do not control your intelligence.
- If your private reasoning is continuously exported, you do not control your ideas.
- If your costs can be repriced overnight, you do not control your operating margin.
- If your access can be revoked by policy, demand spikes, or product decisions, you do not control your future.

Sovereign inference means keeping optionality.
Sovereign inference means designing for adversarial conditions.
Sovereign inference means your agent system can still function when cloud prices spike, access tightens, models get rate-limited, or supply chains crack.

## Why now

As models get smarter and smaller, like Gemma 4 and Qwen3.5, the direction becomes obvious. The future belongs to systems that can run intelligence anywhere, on hardware you control, on hardware you rent intelligently, or on whatever inference capacity is available at that moment.

Model progress is shrinking the moat of centralized inference. Better small models plus better open-weight ecosystems mean the balance shifts toward adaptive systems that can move fluidly between local, workstation, datacenter, and burst cloud environments.

That is the world rtxclaw is built for.

## Architecture

### System prompt layout (mirrors OpenClaw)

Tool-contract guidance lives **with the tool**, in code. Behavioural style lives in operator-editable `.md` files. Both sit above the cache boundary; the contract goes FIRST so the model sees it before lost-in-the-middle attention drops.

Stable-prefix order (above `<!-- RTXCLAW_CACHE_BOUNDARY -->`):

1. `## Tooling` — hardcoded in `rtxclaw_core.system_prompt.tooling_prompt_section`. Carries the `todo` contract: "if 2+ steps, call `todo` first; one `in_progress` at a time; don't restate the plan."
2. `## Execution Bias` — hardcoded, mirrors OpenClaw's `buildExecutionBiasSection` verbatim. Negative reinforcement: "do not finish with a plan/promise when tools can move it forward."
3. `WORKSPACE.md` → `SOUL.md` → `USER.md` → `TOOLS.md` → `TOOLCALL.md` → `AGENTS.md` (`## ⚠️ When to plan` — behavioural triggers only; the contract is upstream).
4. `MODE-AUTO.md` / `MODE-PLAN.md` / `MODE-ASK.md` (one, by active permission mode).
5. `# SKILLS` summary block.
6. `# EXTERNAL TOOLS — MCP / ACP` block (when configured).
7. `# DEFERRED TOOLS` catalog (names + 1-line descriptions; full schemas loaded via `tool_search`).

Below the cache boundary:

- `# WORKSPACE-CONTEXT` (frontend, cwd).
- `# MEMORY.md` (curated long-term memory; main session only).
- `# HEARTBEAT.md` (heartbeat session only).

The `## Tooling` + `## Execution Bias` reordering (2026-05-07) was driven by an observed compliance gap: a Qwen3.6-27B-BF16 session on SGLang ignored the planning rule that the same prompt's FP8/vLLM session followed. The rule had been buried at byte ~13 K of a 26 K stable head. With the new layout it sits at byte 0.

### Single-process agent gateway

rtxclaw is moving from "one daemon per agent" to "one gateway, many child subprocesses":

```
                rtxclaw gateway (single parent process)
                ├─ binds 1 ACP HTTP listener (default :20100)
                ├─ child manager: lazy-spawn, idle-timeout, crash-respawn
                └─ /acp/<agent_name>/...  →  child stdio
                                              │
                ┌─────────────────────────────┼─────────────────────────────┐
                ▼                             ▼                             ▼
            main child                  scraper child         …      agent N child
        (rtxclaw agent acp-stdio,    (rtxclaw agent acp-stdio,   (per-agent process,
         CoreBackend(main))            CoreBackend(scraper))      lazy-spawned on first
                                                                  session/new for that name)
```

- **ACP-compliant.** The gateway is itself an ACP server externally (existing `make_acp_app`). Each child is an ACP server over stdio (existing `acp_stdio_main`). The gateway is a per-session multiplexing proxy — no protocol change.
- **Bootstrap context filtered per child.** Sub-agents only get `AGENTS.md` + `TOOLS.md` for context economy (mirrors OpenClaw).
- **Lazy spawn.** Children come up on first `session/new` for their agent name; idle-timeout (default 30 min) reaps them. Bounded concurrency (default 16 live children) so 100+ agent types don't all run at once.
- **Cancel cascade.** Gateway-level abort (SIGTERM, `gateway stop`) fans out to every live child. Per-session ACP `cancel(parent_session_id)` cancels the parent's turn AND any subagent sessions the parent has spawned.
- **Migration path.** Existing `rtxclaw agent start <name>` keeps working as a standalone daemon during transition. Gateway mode is opt-in via `rtxclaw gateway start`.

### Abort & cancel scopes

One of the reasons rtxclaw exists. With a hosted assistant you can't selectively kill *just this branch of work* — you abort the chat or you don't, and the moment you abort you also lose every parallel thread the agent had going. rtxclaw has two cleanly separated abort scopes; nothing in between, on purpose.

**Narrow: `session/cancel` (per-session ACP cancel).**

Cancel an in-flight prompt on one session. What dies:

- the turn currently running in that session
- every subagent session that turn spawned (via `delegate_agent` / future gateway-aware subagent), regardless of which agent child hosts them — so a `main` session that fanned work out to `scraper` and `researcher` cancels all three with one call

What survives:

- every other session on the same agent (parallel chats keep going)
- every session on every other agent (other agents are untouched)
- the agent child processes themselves (warm, ready for the next turn)

This is the granularity an operator actually needs. "Stop this idea, keep everything else running" works without bringing down adjacent work.

**Broad: `rtxclaw gateway stop` (SIGTERM the gateway).**

The single big-red-button. SIGTERM the gateway parent → `ChildManager.shutdown()` SIGTERMs every live agent child within a 10 s deadline → every session on every agent dies. Use when something is genuinely wedged at the host level.

**Why no per-agent middle ground.** A "kill just `main`, leave `scraper` running" command would solve a problem `session/cancel` already covers — if a session on `main` is misbehaving, cancel that session. The agent child process itself is cheap (idle-reaped at 30 min by default), so killing the whole child to abort one session is throwing away warm state for no reason. We can add `gateway kill <agent>` later if a real use case shows up; today it'd be a footgun more than a feature.

**Cleanup is OS-level, not best-effort.**

- Each child agent runs in its own process group (`start_new_session=True`).
- Each `monitor_start`-spawned process runs in its own process group.
- SIGTERM at every layer escalates to SIGKILL after a grace period.
- A wedged child cannot block gateway shutdown — `ChildManager.shutdown()` is bounded; orphaned children get killed by the OS when the gateway exits.

You always know what dies and what doesn't.

### Subagent infrastructure (canonical `Agent` tool)

The model-facing surface for spawning subagents is the single Claude-Code-canonical `Agent` tool:

```
Agent(
  subagent_type="general-purpose" | "claude" | "codex" | "gemini",
  description="<short label shown in TUI subagent box>",
  prompt="<full task prompt — subagent does NOT see parent history by default>",
  name="<sibling agent name, only when subagent_type=general-purpose>",
  model="<optional override forwarded to the subagent>",
  cwd="<optional working directory>",
)
```

The runtime routes `Agent(subagent_type=…)` by the discriminator:

| `subagent_type` | Routed to | Notes |
|---|---|---|
| `general-purpose` (or any sibling agent name like `coder`) | sibling rtxclaw agent | runs in its own home dir + tool allowlist; default target `main` |
| `claude` | `claude -p` CLI bridge | full Claude Code session, resumable |
| `codex` | `codex` CLI bridge | OpenAI Codex |
| `gemini` | `@google/gemini-cli` | Google Gemini, long-context + multimodal |

The runtime-only `delegate_claude` / `delegate_codex` / `delegate_gemini` / `delegate_agent` shims are intentionally hidden from the model (`_HIDDEN_LEGACY_TOOL_NAMES` filter on `BUILTIN_TOOLS`) so the model only sees Claude-Code-parity primitives. They remain importable for the slash-command path (`/claude`, `/codex`, `/gemini`) which are operator-driven session-continuation aids, not one-shot subagent dispatches.

**Parallel fan-out.** When the model emits multiple `Agent` tool_use blocks in one response, the runtime's `_build_parallel_groups` + `_execute_batch_parallel` fans them out via `asyncio.gather` across one subprocess per call. Claude does this natively; deepseek / kimi / qwen tend to serialise Agent dispatch when each call carries a long prompt — that's a model-side behaviour, not a runtime limit.

**Per-call sidecars.** Every `Agent` invocation captures `{call_id, tool, args, status, ts_start/end, elapsed_s, ok, result}` to `<sessions_dir>/<sid>.subagent.<call_id>.json` independent of the per-engine continuity sidecars (`<sid>.delegate_<kind>.json`). Two layers of evidence: rtxclaw's sidecar frames the call; Claude Code's `~/.claude/projects/<project>/subagents/agent-<agentId>.jsonl` holds the full inner stream (linked by `agentId` inside the rtxclaw sidecar's `result` field).

### Monitor tool

Long-running background process registry. Four model-facing tools:

- `monitor_start(command, cwd?)` — spawn the command in its own process group, return a `monitor_id`.
- `monitor_read(monitor_id, max_lines=100, timeout_s=2.0)` — pop unread lines from the buffer; if buffer is empty, waits up to `timeout_s` for the next line OR for the process to exit.
- `monitor_stop(monitor_id)` — SIGTERM the process group, escalate to SIGKILL after a grace period, return the exit code.
- `monitor_list()` — every live monitor + its buffer state. Reads the sidecar so monitors started in earlier tool rounds still surface.

Plus two operator-facing surfaces in the TUI:

- `/monitors` — opens a navigable modal panel: ↑/↓ to navigate, `x` to SIGTERM the selected PID, `r` to reload, `q`/Esc to close. Refreshes itself every 1.5 s so processes that exit elsewhere disappear without manual reload.
- `monitor_list` (model-facing) prints the same data as a one-shot text dump for non-interactive review.

v1 is **pull-based** (the model polls); the buffer is an in-memory rolling window (~5 000 lines per monitor). The push-based variant — each line becomes a session-update notification that wakes the agent between rounds — is a follow-up that needs gateway integration.

**Cross-subprocess persistence.** Tool runners run in short-lived subprocesses; v1's in-process registry would die between calls. To survive, every `monitor_start` writes an entry to a per-session sidecar at `<sessions_dir>/<parent_sid>.monitors.json`. The actual subprocess keeps running across tool rounds because it was spawned with `start_new_session=True` and is reparented to init when its launcher exits. `monitor_list`, `monitor_stop`, and the TUI panel all read the sidecar and recompute liveness via `os.kill(pid, 0)`, so a process that died after its launcher subprocess exited still gets listed correctly. Each monitor runs in its own process group so `monitor_stop` reaps any subcommands the shell spawned.

### TUI dashboard footers

The TUI carries three persistent footer rows that summarise the active turn's state at a glance. All three update live as the model fires tool calls; empty content auto-hides each row so an idle session stays minimal.

- **📐 plan** — populated from the model's `TodoWrite` calls (Claude Code parity). Multi-step plan with per-item status (`☑` done / `▶` in-progress / `☐` pending). Subagent-emitted TodoWrites are gated out of the parent footer — they belong to the subagent's own view.
- **📋 Tasks · N/M done** — populated from the `Task*` tool family (`TaskCreate` / `TaskUpdate` / `TaskList` / `TaskGet`). Persistent task tracker with stable numeric ids, owners, blocks/blockedBy relations. Persisted at `<sessions_dir>/<sid>.tasks.json`. Operator can steer the list directly with the `/task` slash command (`/task add <subject>` / `/task in <N>` / `/task done <N>` / `/task del <N>` / `/task list`) — mutations are mirrored to both the in-memory footer and the on-disk store so the model's next `TaskList` sees them.
- **🤖 Subagents · N/M done · K in flight** — aggregate counter for every `Agent` dispatch in the session. Pinned at the bottom so the operator sees fan-out progress even after individual `SubagentBoxMessage` cards have scrolled out of the viewport.

### Subagent navigation (operator-facing)

Every `Agent` dispatch mounts a bordered `SubagentBoxMessage` in the transcript with a live header: `🤖 Subagent · <label> · running… · N/M tools · last: <ToolName> · <elapsed>s`. The header counter is bumped from the parent dispatcher whenever an inner subagent tool fires (`note_inner_call`) or returns (`note_inner_done`) — the inner widgets themselves are **suppressed from the parent transcript** (deliberate: keeps the parent uncluttered, lets the operator type steering messages without scrolling chaos). The full inner stream still goes to the per-call sidecar.

Navigation:

| Keybinding / command | Action |
|---|---|
| `alt+↑` / `alt+↓` | step focus through every `SubagentBoxMessage` mounted in the transcript (highlights the focused box, scrolls it into view) |
| `alt+Enter` | mount a Static card showing the focused box's sidecar contents inline (prompt + result + elapsed + path on disk) |
| `/subagents` | list every subagent sidecar for the active session: `delegate_agent.<target>.json` (sibling agents), `delegate_<claude/codex/gemini>.json` (CLI bridges), and `subagent.<call_id>.json` (per-call `Agent` captures) |
| `/subagents <N>` | enter / view the Nth row from `/subagents` output |

The two-layer capture means even after the TUI is closed, every subagent run is forensically recoverable: rtxclaw's sidecar JSON frames the call (what the parent asked, when, how long, ✅/❌); Claude Code's `agent-<agentId>.jsonl` (linked by `agentId` inside the sidecar's `result`) holds every inner tool call and assistant message the subagent produced.

### Operator UX details

- **Slash commands are echoed compactly.** When the operator types `/goal <condition>` or any other builtin `.md`-file command, the full expanded markdown body is sent to the model (it needs the execution context) but the transcript only echoes `❯ /goal <condition>` — no 49-line preamble polluting scrollback. Implemented via the `_next_user_echo` stash in `_send_user_turn`.
- **Footers clear on session reset.** `/new` and a failed reattach after `/restart` both wipe `📐 plan`, `📋 Tasks`, and `🤖 Subagents` so the next session starts with a clean slate.
- **`#harness-tasks` task store** is a Claude-Code-`Task` parity for the local rtxclaw agents. Adding it required: registering 4 Tool entries (`TaskCreate` / `TaskUpdate` / `TaskList` / `TaskGet`), implementing their runners (`run_task_create` etc. in `tools/runners.py`), and a per-session JSON sidecar at `<sid>.tasks.json`. The local `main` agent (running deepseek / kimi / qwen) can now use the same `Task*` primitives Claude Code's harness offers, with synonym-tolerant arg parsing (`taskId` ⇄ `id` ⇄ `task_id`, `subject` ⇄ `title` ⇄ `name`) so smaller-context LLMs land on the first try.
- **Operator requirements** are tracked in [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md) — every TUI / runtime change raised by an operator is logged there with a `done` / `partial` / `todo` status so regressions stay attributable.

### Logging

Every tool call is instrumented to append a structured event line to the agent's `gateway.log` (`<RTXCLAW_AGENT_HOME>/logs/gateway.log`). Format:

```
2026-05-07T13:39:56Z MONITOR_START monitor_id="2dae1ed81a36" pid=2377452 command="…" cwd=null
2026-05-07T13:39:56Z MONITOR_EXIT  monitor_id="2dae1ed81a36" exit_code=0 buffered_lines=24 unread=12 cancelled=false
2026-05-07T13:39:56Z MONITOR_STOP  monitor_id="2dae1ed81a36" cross_subprocess=true was_alive=true pid=2377452
2026-05-07T13:39:57Z DELEGATE_AGENT_START   target="scraper" parent_session="69fc…" task_chars=82
2026-05-07T13:39:58Z DELEGATE_AGENT_SPAWNED target="scraper" child_pid=2378001 cwd="/home/…"
2026-05-07T13:40:05Z DELEGATE_AGENT_DONE    target="scraper" stop_reason="end_turn" reply_chars=482
```

Greppable by event prefix. Best-effort writes — a logfile that's been rolled away or is unwritable will not break the tool call. The session JSON remains the authoritative trace; this log is auxiliary telemetry for cross-call debugging.

### Reliability — Telegram bot ↔ gateway recovery

The Telegram bot (`rtxclaw-telegram.service`) and the agent gateway (`rtxclaw-gateway.service`) are independent systemd units. They can restart in either order without bringing each other down — but only because the bot now treats every gateway-side identity as "live until proven otherwise" and re-establishes anything that disappeared.

Three failure modes the bot now tolerates without operator intervention:

1. **Idle reap of a chat session.** Children GC sessions after `child_idle_timeout_s` (default 1800 s). A chat that goes quiet for 30+ minutes returns to a gateway that has never heard of its sid. Before recovery: `session_prompt` raised `LookupError: no live child owns session …` and the bot returned `⚠️ agent error — see telegram.log`. Now the bot calls `session/load` (or `session/new` as a fallback), updates the chat's persisted sid, and re-issues the prompt — the user sees the model answer, not the error.
2. **Gateway parent restart between turns.** A redeploy / `systemctl restart rtxclaw-gateway` / OOM kill resets the in-memory `child.sessions` map. Same symptom as (1), same fix path.
3. **SSE stream torn down by gateway crash.** The bot's `AcpClient` holds a long-lived GET SSE pump. If the gateway dies, the TCP socket lands in `CLOSE-WAIT` and the bot's read loop exits — but the cached client used to stay in `self._clients`, so the next prompt would either hang on a dead future or fail with `AcpTransportClosed` on every retry. The fix has three parts:
   - `AcpClient._read_loop` now flips `self._closed = True` in its `finally`, not only `close()`.
   - `AcpClient._reserve_call` rejects calls on a closed client with `AcpTransportClosed("client is closed")` instead of allocating a future the (already-exited) read loop will never satisfy.
   - The bot's `_get_or_open_client` evicts cached clients with `_closed=True` and reopens; the `_dispatch_blocks` recovery path also drops the dead client mid-turn before re-running the prompt.

The recovery is bounded: at most **one retry per turn**, and only when nothing has streamed yet (no content / thoughts / tool headers). A failure mid-stream surfaces normally — re-running would emit duplicate output to the user.

The `/agent <name>` command verifies the saved session up-front via `session/load` before promising "Resuming saved session …". An invalid binding now silently rolls over to a fresh session at switch time instead of failing on the first message after the switch.

### Group-chat reply gates

Two `<telegram_home>/config.json` knobs control when the bot speaks up in shared groups. Both default to "off" so an existing single-operator deployment behaves exactly as before.

```jsonc
{
  "allowed_chat_ids":  [-1003916299625],   // chat-level allowlist (existing)
  "allowed_user_ids":  [8484692594],       // NEW — per-user allowlist (silent drop)
  "mention_required_in_groups": true       // NEW — only reply when @-mentioned in groups
}
```

`allowed_user_ids` (silent per-user drop) — when non-empty, a message in an allowed chat still has to come from one of these `from.id`s. Different members of a group fall through silently (no "you're not allowed" reply, because that would advertise the bot to everyone reading the chat). Empty list ⇒ the gate is disabled and every member of an allowed chat may interact, matching the legacy behavior.

`mention_required_in_groups` (silent group gate) — when true, in non-private chats the bot ignores everything that doesn't address it. Three signals count as "addressed":

1. `@<botname>` anywhere in the raw text (covers `/cmd@<botname>` and conversational mentions).
2. The message is a direct reply (`reply_to_message`) to one of the bot's own messages — Telegram's swipe-to-reply convention.
3. A `text_mention` entity targets the bot's user id (a Telegram client mentioned the bot without typing the @handle, e.g. tap-to-mention from the member list).

DMs (`chat.type == "private"`) are always exempt from this gate — there's no other recipient there to confuse, so requiring a mention would just be friction.

Both knobs are settable via `rtxclaw configure`:

```bash
rtxclaw configure --telegram-allowed-user-ids "8484692594,123456789"
rtxclaw configure --telegram-mention-required on
# Clear the per-user list (revert to "any member of an allowed chat"):
rtxclaw configure --telegram-allowed-user-ids ""
```

The bot reads config at startup; restart with `sudo systemctl restart rtxclaw-telegram.service` after editing.

### Diagnosing the recovery path

When the bot exercises any of the three fallbacks above it logs to `~/.rtxclaw/telegram/logs/telegram.log`:

```
WARNING session_prompt: gateway lost session chat=… sid=… — attempting reopen (…)
INFO    session reopen via session/load chat=… sid=…           # success: same sid restored
INFO    session reopen via session/new chat=… old_sid=… new_sid=…   # fallback: new sid bound
WARNING evicting closed ACP client for chat_key=…; reopening   # transport-died path
```

Operator runbook for "bot replied `agent error` once and then started working":

1. Find the message in `telegram.log` — recovery emits the WARNING + INFO above.
2. If the gateway side reaped the session, check `child_idle_timeout_s` in `~/.rtxclaw/config.json` (default 1800 s); raise it if 30 min is too short for the conversation cadence.
3. If the gateway parent itself died, `journalctl -u rtxclaw-gateway.service --since "10 min ago"` shows whether it was a clean restart, a SIGTERM, or a SIGKILL. A SIGKILL with no kernel OOM in `dmesg` and no `MemoryMax` set usually means manual `systemctl restart` during dev work.

`MemoryMax` on the gateway unit is `infinity` by default — there is no in-process memory cap. If you want one, set it under `[Service]` in `deploy/systemd/rtxclaw-gateway.service` (e.g. `MemoryMax=2G`); the recovery logic above handles a cgroup-OOM kill identically to a manual restart.

### Self-improver

rtxclaw can analyse its own sessions and propose code fixes via
isolated git worktrees. Install per host:

    python -m rtxclaw_agent self-improver install --agent main

This copies the hook wrapper to `~/.rtxclaw/scripts/`, drops a default
`hooks.json` into `~/.rtxclaw/agents/main/`, registers a 15-minute
cron entry for the cold-path scanner, and prepares the worktree root
under `~/.rtxclaw/self-improver/worktrees/`. See
`docs/superpowers/specs/2026-05-17-rtxclaw-self-improver-design.md`
for the design.

## The vision

rtxclaw is not a neocloud wrapper.
rtxclaw is not a dependency engine.
rtxclaw is not permissioned intelligence.

rtxclaw is sovereign inference infrastructure for the agent era.

It is a system where:

- your data stays under your control
- your agents adapt to your hardware reality
- your costs are shaped by intelligent routing, not vendor extraction
- your stack remains understandable enough to audit and modify
- your business does not collapse because someone else throttled access to intelligence

The AI future will not belong only to the largest labs.
It will belong to those who can route, compress, adapt, and deploy intelligence with discipline.

rtxclaw is built for that future.
