Metadata-Version: 2.4
Name: rosclaw-how
Version: 1.0.2
Summary: Online deadlock-breaker — injects targeted engineering heuristics from rosclaw-know assets into stalling agents.
Project-URL: Homepage, https://rosclaw.io
Project-URL: Repository, https://github.com/ros-claw/rosclaw-how
Project-URL: Documentation, https://docs.rosclaw.io
Project-URL: Bug Tracker, https://github.com/ros-claw/rosclaw-how/issues
Author-email: ROSClaw Team <team@rosclaw.io>
License: MIT
Requires-Python: >=3.11
Requires-Dist: fastapi>=0.110
Requires-Dist: numpy>=1.24
Requires-Dist: pydantic>=2.6
Requires-Dist: pyseekdb>=1.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: sentence-transformers>=2.7
Requires-Dist: uvicorn[standard]>=0.27
Provides-Extra: all
Requires-Dist: httpx>=0.27; extra == 'all'
Requires-Dist: pytest-asyncio>=0.23; extra == 'all'
Requires-Dist: pytest>=8; extra == 'all'
Requires-Dist: ruff>=0.5; extra == 'all'
Provides-Extra: dev
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# ROSClaw-How

**Online deadlock-breaker** for agents stuck on engineering-optimization tasks,
with a feedback loop that lets the assets refine themselves over time.

Sister project: **rosclaw-know** (offline refinery that produces the
`bridge_index.json` + `code_patterns/` assets this service serves at runtime,
and consumes the outcome JSONL this service exports to drive the next
publish cycle).

## What it does

When an agent's verifier score plateaus or a physical safety symptom appears in
its error log, this service injects a small, targeted hint into its next prompt.
Three strategies, decided server-side:

| Strategy           | Trigger                                  | Returned payload                                  |
|--------------------|------------------------------------------|---------------------------------------------------|
| `SAFETY`           | Error log mentions a safety symptom      | Hard-coded constraint (~50–100 tokens)            |
| `FREE_EXPLORATION` | First 3 iterations, or score improving   | Empty string — keep exploring                     |
| `CATALYST`         | Score plateau / regression               | Cross-domain analogy + diff (≤ 400 tokens)        |

Runtime is pure rules + a single vector lookup — **zero LLM calls**. The
CATALYST path returns an ``injection_id`` so the agent can later report whether
the hint helped (`POST /wiki/v1/prompt/feedback`); the resulting outcomes drive
per-pattern uplift statistics and soft-deprecation of under-performing
patterns.

## Feedback loop (the “push + learn” cycle)

```
                  ┌─────────────────────────────────────────┐
                  │ rosclaw-know  (offline refinery)        │
                  │   awesome_fetcher → new raw corpus      │
                  │   active_learning → autodraft → ingest  │
                  │   feedback_distill.py → pattern_metrics │
                  │   bridge_reweighter   → priority=-1     │
                  │                                         │
                  │   writes bridge_index.json (+priority)  │
                  └───────────────────┬─────────────────────┘
                                      │ asset publish
                                      ▼
   ┌─────────────────────────── rosclaw-how ───────────────────────────┐
   │ asset_loader        delta-sync bridge_index → SeekDB               │
   │ SemanticRouter      skips clusters with priority < 0               │
   │                                                                    │
   │ POST /wiki/v1/prompt/build       → snippet + injection_id          │
   │ POST /wiki/v1/prompt/feedback    → post_score, delta_score         │
   │ GET  /wiki/v1/stats              → bucketed uplift / win_rate      │
   │ GET  /wiki/v1/blind_spots        → recurring Unknown_Error gaps    │
   │ GET  /wiki/v1/outcomes/export    → NDJSON stream for offline pipe  │
   │ POST /wiki/v1/admin/reload       → hot-reload assets               │
   │ POST /wiki/v1/admin/promote      → maturity gate (staging→prod)    │
   └────────────────────────────────────────────────────────────────────┘
                                      │ NDJSON export
                                      ▼
                  ┌─────────────────────────────────────────┐
                  │ rosclaw-know   data/exports/*.jsonl     │
                  │   distill_feedback.py → re-publish ↻    │
                  └─────────────────────────────────────────┘
```

Closed-loop validation: 6/6 stuck-rollout scenarios pass the replay benchmark
(`scripts/replay_benchmark.py` on the rosclaw-know side) — bad patterns get
`priority=-1`, vanish from the next CATALYST lookup, and good patterns keep
their slot.

## Quick start

```bash
cd rosclaw-how
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

cp .env.example .env
# (optional) symlink the assets from a finished rosclaw-know run:
ln -s ../../rosclaw-know/data/assets data/assets

# Run tests
pytest -q

# Start the server
python scripts/run_server.py
# → POST http://localhost:47820/wiki/v1/prompt/build
```

### Deploying on a memory-constrained host

`pyseekdb` embedded mode boots a small OceanBase-like observer in-process
(~1.5–2 GB RAM after warmup, plus a 4 GB datafile reservation on disk).
If you are running on a host where another embedded SeekDB instance is
already up (e.g., the legacy `rosclaw-wiki` service), or where the box is
too small to fit a second observer, switch to server mode:

```bash
# Or use a remote cluster
SEEKDB_MODE=server
SEEKDB_HOST=10.0.0.5
SEEKDB_PORT=2881
SEEKDB_TENANT=sys           # OceanBase tenant name
SEEKDB_USER=root
SEEKDB_PASSWORD=…
```

The SAFETY and FREE_EXPLORATION paths never touch SeekDB, so they remain
available even when the database is unreachable. The CATALYST path falls
back to `InMemoryRouter` (numpy cosine over `bridge_index.json`) when
`ROSCLAW_HOW_ROUTER_BACKEND=inmemory` is set — useful when SeekDB is down
or absent.

### Auto-create database

On first boot, `seekdb_client._ensure_database_exists` runs an idempotent
`CREATE DATABASE IF NOT EXISTS <SEEKDB_DATABASE>` via `pyseekdb.AdminClient`
before opening the data-plane client. This means a fresh embedded SeekDB
(which only ships the `test` database by default) bootstraps cleanly with
`SEEKDB_DATABASE=rosclaw_how` without manual setup.

## API

All endpoints share the `/wiki/v1` prefix kept for compatibility with the
legacy rosclaw-wiki API.

### `POST /wiki/v1/prompt/build`

Auth: `X-API-Key`.

Request:
```json
{
  "error_log": "ERROR: torque overflow on joint 2",
  "previous_scores": [0.42, 0.47, 0.47, 0.46],
  "current_iteration": 4
}
```

Response on a CATALYST hit (the `injection_id` is the handle for the
follow-up feedback call):

```json
{
  "prompt_snippet": "## 🔧 Engineering Heuristics from ROSClaw-How ...",
  "injected": true,
  "strategy": "CATALYST",
  "symptom": "Oscillation_Divergence",
  "matched_symptom": "Commanded velocity diverges to ±∞ ...",
  "similarity": 0.5864,
  "injection_id": "0fd3eb2bd37c461490c4f43def243512",
  "pattern_id": "pattern_output_saturation_clamp",
  "latency_ms": 199
}
```

When the matched cluster is in **staging** (`priority=0`), the response
includes `is_staging: true` so the agent knows the pattern has not yet
been promoted to production:

```json
{
  "strategy": "CATALYST",
  "is_staging": true,
  "pattern_id": "pattern_20260518_1bfb99e13c"
}
```

Production clusters (`priority=1` or unset) omit the key entirely for
backward compatibility.

SAFETY and FREE_EXPLORATION responses omit `injection_id` / `pattern_id`.

### `POST /wiki/v1/prompt/feedback`

Auth: `X-API-Key`. Returns **204 No Content** on success, **404** if the id is
unknown.

```json
{
  "injection_id": "0fd3eb2bd37c461490c4f43def243512",
  "post_score": 0.83,
  "iterations_to_resolve": 3,
  "agent_notes": "anti-windup clamp fixed it"
}
```

The server computes `delta_score = post_score - pre_score` (where `pre_score`
is the last entry from the original `previous_scores`).

### `GET /wiki/v1/stats`

Public, no auth. Aggregates **finalised** outcomes (those that have received
feedback) per pattern_id, grouped by maturity bucket:

```json
{
  "staging": {
    "pattern_20260518_1bfb99e13c": {
      "n": 5,
      "avg_uplift": 0.142,
      "win_rate": 0.8,
      "last_seen_iso": "2026-05-18T19:14:22+00:00"
    }
  },
  "production": {
    "pattern_output_saturation_clamp": {
      "n": 8,
      "avg_uplift": 0.157,
      "win_rate": 0.875,
      "last_seen_iso": "2026-05-18T19:14:22+00:00"
    }
  },
  "demoted": {
    "pattern_bad_habit": {
      "n": 12,
      "avg_uplift": -0.03,
      "win_rate": 0.25,
      "last_seen_iso": "2026-05-18T19:14:22+00:00"
    }
  },
  "unbucketed": {}
}
```

`win_rate = sum(delta_score > 0.05) / n`.

The `unbucketed` catch-all holds pattern_ids whose owning cluster was
deleted or renamed since the outcome was recorded.

### `GET /wiki/v1/outcomes/export`

Auth: `X-API-Key`. Streams every outcome (including still-pending ones) as
newline-delimited JSON. Query params:

- `since` — ISO 8601 timestamp; only rows with `ts >= since` are emitted.
- `limit` — optional row cap (max 100 000).

```bash
curl -H "X-API-Key: $ROSCLAW_HOW_API_KEY" \
     "http://127.0.0.1:47820/wiki/v1/outcomes/export?since=2026-05-17T00:00:00+00:00" \
     -o outcomes.jsonl
```

The same content is also produced by `scripts/export_outcomes.py`, which is
a thin CLI wrapper around this endpoint (it used to read SeekDB directly
but deadlocked against the embedded server's process-exclusive lock —
fixed in Phase 4).

### `GET /healthz`

Public, no auth. Operational snapshot:

```json
{
  "status": "ok",
  "version": "0.1.0",
  "auth_enabled": true,
  "seekdb_mode": "embedded",
  "router_backend": "seekdb",
  "cluster_count": 349,
  "embedding_dim": 384,
  "bridge_index_mtime": "2026-05-18T18:28:04+00:00",
  "similarity_floor": 0.5,
  "blind_spot_count": 0
}
```

`blind_spot_count` is the number of `Unknown_Error` prefix buckets that
have crossed the recurrence threshold within the active sliding window
(see `GET /wiki/v1/blind_spots` below).

### `POST /wiki/v1/admin/reload`

Auth required (`X-API-Key`). Re-reads `bridge_index.json` and
`code_patterns/` into SeekDB without bouncing the server. Body is
optional:

```bash
curl -X POST http://127.0.0.1:47820/wiki/v1/admin/reload \
     -H "X-API-Key: $ROSCLAW_HOW_API_KEYS" \
     -H "Content-Type: application/json" \
     -d '{}'             # incremental (delta) reload
```

`{"rebuild": true}` drops both SeekDB collections first; default is an
idempotent incremental upsert. The loader fingerprints each cluster
(`standard_name` + sorted patterns + sorted keywords + canonical-JSON
analogies + `priority`) with SHA-256 — unchanged rows skip the
sentence-transformer encode call entirely. On a 350-cluster bundle this
turns a ~4-minute full reload into a ~20-second no-op when nothing has
changed.

Rows whose IDs disappeared from the bridge — or whose `priority` flipped
to `-1` (soft-deprecated) — are deleted from SeekDB. The response
exposes both the alive totals and the per-bucket counters so dashboards
can show "what just happened":

```json
{
  "symptoms": 349,
  "patterns": 352,
  "demoted_skipped": 3,
  "symptoms_detail": {"added": 16, "updated": 0, "unchanged": 333, "deleted": 0},
  "patterns_detail": {"added": 0, "updated": 0, "unchanged": 352, "deleted": 0},
  "rebuild": false,
  "duration_ms": 23900
}
```

After loading, the cached `SemanticRouter` is rebuilt synchronously so
`/healthz` immediately reports the fresh `cluster_count` / `router_backend`
(rather than a null window until the next CATALYST request).

### `POST /wiki/v1/admin/promote`

Auth required (`X-API-Key`). Bump or set a cluster's maturity priority.

Body accepts exactly one of `delta` (relative change) or `priority`
(absolute set). The lookup key is the `pattern_id` (one of the `*.md`
files in `code_patterns/`); the endpoint walks `bridge_index.json` to
find the *owning* cluster whose `associated_patterns` list contains the
given `pattern_id`.

```bash
# Relative bump (capped to [-1, +1])
curl -X POST http://127.0.0.1:47820/wiki/v1/admin/promote \
     -H "X-API-Key: $ROSCLAW_HOW_API_KEYS" \
     -H "Content-Type: application/json" \
     -d '{"pattern_id": "pattern_20260518_1bfb99e13c", "delta": 1}'

# Absolute set (also capped)
curl -X POST http://127.0.0.1:47820/wiki/v1/admin/promote \
     -H "X-API-Key: $ROSCLAW_HOW_API_KEYS" \
     -H "Content-Type: application/json" \
     -d '{"pattern_id": "pattern_20260518_1bfb99e13c", "priority": 1}'
```

Response:
```json
{
  "pattern_id": "pattern_20260518_1bfb99e13c",
  "cluster_id": "20260518_1bfb99e13c",
  "old_priority": 0,
  "new_priority": 1
}
```

On success, the endpoint:
1. Atomically updates `bridge_index.json`
2. Appends one JSONL row to `data/audit_log.jsonl`
3. Re-upserts the cluster's metadata in SeekDB so the router sees the
   change immediately
4. Invalidates the cached router

Priority semantics:
| Value | Meaning |
|---|---|
| `-1` | Demoted — runtime skips (soft-deprecated) |
| `0`  | Staging — runtime injects with `is_staging=true` |
| `+1` | Production — normal, no flag |
| unset | Backward compat — treated as production |

Returns **404** when `pattern_id` is not found in any cluster, **422**
when both/neither of `delta`/`priority` is provided.

### `GET /wiki/v1/blind_spots`

Public, no auth. Sliding-window summary of recurring `Unknown_Error`
prefixes — i.e. errors the catalyst layer keeps seeing but has no
matching cluster for. This is the work-list for the rosclaw-know triage
queue: each entry corresponds to a pattern we should be teaching.

```json
{
  "window_seconds": 3600,
  "threshold": 3,
  "active": [
    {
      "prefix_hash": "1a3c…",
      "count": 7,
      "first_seen": "2026-05-18T19:18:00+00:00",
      "last_seen": "2026-05-18T19:54:12+00:00",
      "sample_excerpt": "RuntimeError: undocumented quirk in controller stage",
      "is_blind_spot": true
    }
  ],
  "total_unique_prefixes": 4,
  "total_events": 13
}
```

Each crossing event also appends one JSONL row to
`data/blind_spots.jsonl` (configurable via
`ROSCLAW_HOW_BLIND_SPOTS_PATH`). A prefix is only emitted *once per
window* — if it goes quiet for the window length and recurs later, the
next crossing produces a fresh row.

Tuning knobs (env vars, defaults shown):

| Variable | Default | Purpose |
|---|---|---|
| `ROSCLAW_HOW_BLIND_SPOT_WINDOW` | `3600` | sliding-window length in seconds |
| `ROSCLAW_HOW_BLIND_SPOT_THRESHOLD` | `3` | events needed to flag a prefix |
| `ROSCLAW_HOW_BLIND_SPOTS_PATH` | `data/blind_spots.jsonl` | persistent log |

### `GET /ui`

Public, no auth. Single-page operator dashboard. Vanilla HTML + JS, no
external CDN; polls `/healthz`, `/wiki/v1/stats`, and
`/wiki/v1/blind_spots` every 5 seconds and renders:

* **Health KPIs** — version, router backend, cluster count, embedding
  dim, similarity floor, bridge mtime, live blind-spot count.
* **Pattern uplift table** — sortable by bucket (`staging` / `production` /
  `demoted` / `unbucketed`), per-pattern `n / avg_uplift / win_rate /
  last_seen` with an inline bar for the uplift magnitude.
* **Blind spots** — current recurring `Unknown_Error` prefixes (only
  those past threshold), with their hash, last-seen timestamp, and a
  truncated sample excerpt for triage.

Useful as a smoke-screen during deployments and as a low-friction view
into the feedback loop without spinning up a full Grafana stack.

## Architecture

```
rosclaw-know (offline)                rosclaw-how (online, this repo)
─────────────────                     ───────────────────────────────
Reads  6,097 wiki/*.md                Reads SeekDB at runtime
Writes data/assets/bridge_index.json  Loads assets at startup
       data/assets/code_patterns/*    Serves build / feedback / stats / export
Reads  data/exports/*.jsonl           Writes outcome rows on feedback
       (closing the loop)
─────────────────                     ───────────────────────────────
                          ▶ ▶ ▶  assets travel from know → how
                          ◀ ◀ ◀  outcomes travel from how → know
```

Source layout:

```
src/rosclaw_how/
  __init__.py
  api.py              FastAPI app: 9 endpoints
  asset_loader.py     Startup load + --rebuild; delta-sync with content-hash
  auth.py             API-key header check (single-tenant in v0.1)
  blind_spots.py      Sliding-window tracker for Unknown_Error prefixes
  config.py           Typed wrapper around .env
  error_normalizer.py Pure regex: error_log → 10 standardized symptom labels
  inmemory_router.py  RAM-frugal numpy cosine fallback (no SeekDB needed)
  outcomes.py         injection_outcomes persistence + per-pattern aggregation
  semantic_router.py  SeekDB vector search + inspiration assembly + priority gate
  seekdb_client.py    pyseekdb wrapper; auto-creates database; embedded+server
  state_router.py     SAFETY / FREE_EXPLORATION / CATALYST classifier
```

### Router backends

`ROSCLAW_HOW_ROUTER_BACKEND` chooses:

- `auto` (default) — `seekdb` when datafile exists, else `inmemory`
- `seekdb`   — explicit production path; raises on init failure
- `inmemory` — explicit RAM fallback; reads bridge_index.json directly

Both routers expose the same `find_nearest()` contract plus `cluster_count`
and `embedding_dim` properties (surfaced on `/healthz`).

### Runtime priority gate

When rosclaw-know's `bridge_reweighter` decides a cluster has been hurting
agents (negative aggregate uplift with sufficient `n`), it writes
`"priority": -1` into the cluster entry of `bridge_index.json`. After the
next asset publish:

- `asset_loader` carries the field into `symptom_index` metadata.
- `SemanticRouter.find_nearest` over-fetches top-3K results and walks them in
  similarity order, skipping any cluster with `priority < 0`.
- `InMemoryRouter.find_nearest` applies the same filter against its in-RAM
  matrix.

So a soft-deprecated cluster vanishes from CATALYST hits on the next
asset-loader cycle without any agent code change.

## What this replaces

The previous `rosclaw-wiki` cloud API hosted 17 declarative-knowledge endpoints
(search, judgments, code generation, etc.). Empirically, agents in
Frontier-Engineering's optimization loop regressed ~20% when they pulled from
those endpoints — they got encyclopedic context when they needed a poke.

`rosclaw-how` is the focused replacement: nine endpoints, three strategies,
≤400 tokens per CATALYST snippet, no LLM in the hot path, and a feedback
loop that keeps the asset bundle honest.

## Closed-loop verification

Two harnesses, each tuned to a different cost/coverage budget:

- `scripts/verify_how_seekdb.py` — strict 4-case verifier that pre-flights
  `/healthz` (refuses non-`seekdb` backends), then asserts each case is
  `CATALYST` with `similarity ≥ similarity_floor` and `latency_ms < 1500`.
  Writes `data/benchmarks/how_ab_seekdb/summary.json`.

- `scripts/verify_how_lite.py` — A/B against DeepSeek for 4 stuck cases
  (control: FREE_EXPLORATION; treatment: CATALYST). Used by the
  rosclaw-know side's `replay_benchmark.py` to drive 50+ synthetic rollouts
  end-to-end through build → inject → feedback → distill → re-publish.

```bash
export ROSCLAW_HOW_API_KEY=rw_sk_dev_local

# A/B against the deployed service
python scripts/verify_how_seekdb.py

# Faster smoke against DeepSeek, no Frontier-Engineering setup needed
python scripts/verify_how_lite.py --no-agent

# Heavy: hits the real Frontier-Engineering eval (needs that repo)
python scripts/verify_how.py --iterations 500
```
