Metadata-Version: 2.4
Name: harness-maker
Version: 0.14.0
Summary: Project-tailored Claude Code / Cursor / Codex harness generator. Grade-gated review + auto-fix, multi-reviewer consensus, edit-preserving block-merge upgrades, anti-rot crawlers, worktree isolation, version-drift detection, and a 3-layer ai-readiness rubric with ranked improvement actions.
Keywords: claude-code,cursor,codex,harness,autoloop,anti-rot,code-review,consensus,block-merge,worktree,observability,anthropic
Author: Ecro
Author-email: Ecro <e839638@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Dist: jinja2>=3
Requires-Dist: pyyaml>=6
Requires-Dist: pydantic>=2
Requires-Dist: anthropic
Requires-Dist: httpx
Requires-Dist: feedparser
Requires-Dist: typer
Requires-Dist: rich
Requires-Dist: beautifulsoup4>=4.12
Requires-Python: >=3.12
Project-URL: Homepage, https://github.com/Ecro/harness-maker
Project-URL: Repository, https://github.com/Ecro/harness-maker
Project-URL: Issues, https://github.com/Ecro/harness-maker/issues
Description-Content-Type: text/markdown

# harness-maker

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org)
[![Claude Code plugin](https://img.shields.io/badge/Claude_Code-plugin-orange)](https://code.claude.com)
[![Cursor 2.4+ (3.2+ rec)](https://img.shields.io/badge/Cursor-2.4%2B_(3.2%2B_rec)-black)](https://cursor.com)
[![Built with uv](https://img.shields.io/badge/built_with-uv-261230.svg)](https://docs.astral.sh/uv/)

> **A harness that knows your project — and stays that way.**

Interview-born. Grade-gated. Anti-rot by design. Multi-target.

[Why](#why-harness-maker) ·
[Quickstart](#quickstart) ·
[Features](#features) ·
[How it works](#how-it-works) ·
[Slash commands](#slash-commands-the-harness-exposes) ·
[Comparison](#how-it-compares) ·
[Configuration](#configuration) ·
[Targets](#targets) ·
[FAQ](#faq) ·
[Roadmap](#roadmap) ·
[Deep dive](docs/HOW-IT-WORKS.md)

---

## Why harness-maker?

Most Claude Code setups start from a generic template and drift from day one. harness-maker takes a different stance — it builds a harness that is **shaped around your project** and **keeps that shape over time**.

Four principles drive every design decision:

**1. Personalized — Interview-born, not template-pasted.**
The profiler reads your stack, scale, and lifecycle. The interview locks in preset, dev mode, target runtimes, and reviewer depth. A `Side` experiment and a `Production` service get structurally different harnesses — different reviewer sets, different workflow stage counts, different security gates. No generic defaults silently shipped.

**2. Trusted — Grade-gated, not hope-based.**
Every `/hm:execute` runs in a fresh git worktree and follows a TDD loop. `/hm:review` doesn't just report findings — it applies consensus-passed fixes and re-reviews until the grade meets the configured threshold (default A). Pre-LLM mechanical checks (lint, tests) gate the review before any reviewer agent spawns. Reviews you trust don't need second-guessing.

**3. Anti-rot — Built-in, not retrofitted.**
The Claude Code, Cursor, and Codex ecosystem moves fast. harness-maker ships a weekly crawl across 4 sources (Anthropic blog, GitHub releases, arxiv cs.SE/CL/CR, OSV.dev), scores each item for relevance, and surfaces only what matters — always manual-confirmed, never auto-applied. The failure-memory system proposes new skills and rules when the same failure recurs 3× across sessions.

**4. Evolving — Refresh cycles, not rewrites.**
`/harness-maker:make --update` picks up new template improvements without touching your edits (hash-based KEEP/REPLACE per file). The session memory and wiki build up project-specific conventions. Failure proposals turn recurring stumbles into permanent fixes. The harness improves with the project, not against it.

---

## Table of Contents

- [Why harness-maker?](#why-harness-maker)
- [Quickstart](#quickstart)
- [Requirements](#requirements)
- [Features](#features)
- [How it works](#how-it-works)
- [Slash commands the harness exposes](#slash-commands-the-harness-exposes)
- [How it compares](#how-it-compares)
- [Configuration](#configuration)
- [Targets](#targets)
- [Reconcile rules](#reconcile-rules-re-rendering-an-existing-harness)
- [Observability](#observability)
- [Marketplace](#marketplace)
- [FAQ](#faq)
- [Roadmap](#roadmap)
- [Development](#development)
- [Contributing](#contributing)
- [License](#license)

---

## Quickstart

### Universal Bootstrap Prompt

> **Paste into any LLM agent — it installs harness-maker and generates your project harness
> automatically.** Works in Claude Code, Cursor, Codex CLI, or any chat with shell access. The
> agent detects your OS (Linux / macOS / Windows / WSL) and IDE on its own; you do not branch
> by hand.

```
Install and bootstrap harness-maker for this project, end-to-end.

You are an LLM agent with shell access. Detect the environment silently, install
harness-maker, and render the harness. Ask me ONE question (target IDE confirmation)
and otherwise proceed autonomously. Conduct the conversation in the language of my
first reply; default to English if you cannot tell.

Step 1 — Environment detection (silent — do NOT ask me):
  - OS: try `uname -s` (POSIX → Linux / Darwin / CYGWIN* / MINGW*). If `uname` fails,
        returns empty, or returns a value outside that set, treat the host as native
        Windows.
  - WSL: when `uname -s` returns Linux AND `/proc/version` contains "microsoft" or
        "Microsoft" (case-insensitive), treat as WSL — use the POSIX path below.
  - IDE: pick the first match — `$CLAUDE_CODE` or `~/.claude/` → Claude Code;
        `$CURSOR_SESSION` or `~/.cursor/` → Cursor; `$CODEX_SESSION` or `~/.codex/` → Codex;
        none → plain terminal (CLI mode).
  - Python: run `python3 --version` (POSIX) or `python --version` (Windows). Need 3.12+.
        If missing, install via `uv python install 3.12` after Step 2.
  - uv: run `uv --version`. If missing, install:
        • POSIX / macOS / WSL: `curl -LsSf https://astral.sh/uv/install.sh | sh`
        • Native Windows (PowerShell): `irm https://astral.sh/uv/install.ps1 | iex`

Step 2 — Install harness-maker (skip if harness-maker is already on PATH):
  uv tool install harness-maker

Step 3 — Profile the project:
  # POSIX / macOS / WSL:
  harness-maker profile "$(pwd)" --json
  # PowerShell (Windows native) — quote-expanded for paths with spaces:
  harness-maker profile "$((Get-Location).Path)" --json
  Read stack / scale / lifecycle. Tell me: "Your project looks like {stack}, {scale} scale,
  {lifecycle} lifecycle." Recommend a preset (Side or Production) based on the profile.

Step 4 — Confirm targets (ask me ONCE):
  Based on Step 1's IDE detection, suggest one of:
    • Claude Code only → targets=claude-code
    • Cursor only       → targets=cursor
    • Codex only        → targets=codex
    • Multiple          → comma-separated (e.g. claude-code,cursor)
  Wait for my confirmation, then continue.

Step 5 — Generate the harness:
  # POSIX / macOS / WSL:
  harness-maker make "$(pwd)" --preset <PRESET> --locale en \
    --targets <TARGETS> --autoloop
  # PowerShell (Windows native):
  harness-maker make "$((Get-Location).Path)" --preset <PRESET> --locale en `
    --targets <TARGETS> --autoloop

Step 6 — IDE reload:
  • Claude Code: run `/harness-maker:make` to verify the plugin loads.
  • Cursor: reload the window (Ctrl+Shift+P → "Reload Window") so agents, skills, and
    commands under .claude/ are discovered.
  • Codex: AGENTS.md and .codex/ are read on next session start.
  Tell me the harness is ready and suggest: /hm:health
```

<details>
<summary><strong>Manual install (step-by-step, no LLM)</strong></summary>

```bash
# 1. Install harness-maker CLI from PyPI
uv tool install harness-maker

# 2. Profile and generate the harness
cd your-project
harness-maker profile . --json         # inspect your stack/scale/lifecycle
harness-maker make . --preset Production --locale en --targets claude-code,cursor

# 3. (Optional) Load as Claude Code plugin for /harness-maker:make command
claude --plugin-dir /path/to/harness-maker
```

Native Windows / PowerShell equivalents:

```powershell
# Install uv if you don't have it
irm https://astral.sh/uv/install.ps1 | iex

# Install harness-maker and run
uv tool install harness-maker
harness-maker profile . --json
harness-maker make . --preset Production --locale en --targets claude-code,cursor
```

</details>

The interview asks for locale first, then preset (`Side` or `Production`), dev mode, and target runtime. The setup preview explains generated roots, backups, preserved user blocks, target-specific leftovers, review trade-offs, and how to continue advanced Second Brain setup with `/hm:configure`. A fully-rendered harness is ready in one turn.

Re-run with flags to evolve the harness:

```bash
harness-maker make . --audit           # score the existing .claude/ against the rubric
harness-maker make . --add NAME        # graft on a single skill/agent/command
harness-maker make . --remove NAME     # surgically remove one component
harness-maker make . --promote NAME    # move an ad-hoc artifact into the harness
```

---

## Requirements

| Dependency | Notes |
|---|---|
| **Python 3.12+** and **[`uv`](https://docs.astral.sh/uv/)** | Required wherever Claude Code runs against your project — even if your project's primary language is Rust, Node, or Go. Hooks invoke `uv run python -m harness_maker.gates.*`; without `uv` they are silent no-ops. |
| **Claude Code CLI** (plugin + hook support) | Optional for plugin mode (`claude --plugin-dir /path/to/harness-maker`). Not required — the `harness-maker` CLI can generate harnesses independently. |
| **Cursor IDE 2.4+** (3.2+ recommended) | Optional. Reads `.claude/agents/`, `.claude/skills/`, and `.claude/commands/hm/*.md` natively (verified empirically in 0.6.2, re-confirmed 0.7.1 — see `tests/cursor-compat/results-2026-05-08.md`). Hooks render to a separate `.cursor/hooks.json` with Cursor-native schema; both files are emitted when `targets` includes `cursor`. Cursor 3.0+ adds native `/worktree` and `/best-of-n` which coexist safely with harness-maker's prefix-matched cleanup. |
| **OpenAI Codex CLI** | Optional. When `targets` includes `codex`, harness-maker renders Codex-native `.codex/` config, `AGENTS.md`, and `.agents/skills/` assets from the same workflow definitions. |
| **Git** | Required for worktree isolation (every `/hm:execute` and `/hm:loop` run). |

---

## Features

- **Single command, no subcommand sprawl.** `/harness-maker:make` is the only entry point. Everything else is a flag (`--audit`, `--add`, `--remove`, `--promote`). No muscle-memory tax.

- **Two presets, ten override dimensions.** `Side` (1 reviewer, lean, fast) and `Production` (5 reviewers, verify-required, secure) cover ~90% of teams. The remaining 10% comes from override dimensions surfaced in the interview: workflow naming, models, autoloop, anti-rot, worktree, security, context-lint, memory, caching.

- **Three targets from one harness.** Claude Code gets the native `.claude/` runtime. Cursor reuses `.claude/agents/`, `.claude/skills/`, and `.claude/commands/hm/` while receiving Cursor-specific hooks and rules. Codex receives `AGENTS.md`, `.codex/config.toml`, `.codex/hooks.json`, agent TOML files, and `.agents/skills/` so the same stage and workflow model runs in Codex without hand-porting prompts.

- **Unified health audit.** `/hm:health` runs a 3-layer composite: **structural** (ai-readiness deterministic + LLM rubric, 70%/25%/5%), **external risks** (4-source anti-rot crawl + LLM relevance filter), and **personalization** (ADR-011 rubric — Bronze/Silver/Gold/Platinum). Outputs a 3-section `.claude/observability/dashboard.md`. Every item routes through `accept` / `reject` / `defer` — no auto-apply, ever (ADR-001). All telemetry stays local. Replaces 0.12.x's separate `/hm:ai-readiness`, `/hm:refresh`, and `/hm:personalization-audit` commands.

- **Anti-rot pipeline.** Weekly crawl across 4 sources: Anthropic blog/changelog, GitHub releases (`anthropics/claude-code`), arxiv (cs.SE / cs.CL / cs.CR), OSV.dev CVEs. Each item is LLM-scored for relevance with an adaptive threshold (starts at 0.7, adapts ±0.05 based on your accept/reject history). **Always manual-confirmed** — there is no `--auto-apply` path.

- **Worktree isolation per run.** Every `/hm:execute` runs in a fresh `git worktree` under `.worktrees/`. `/hm:loop` allocates one worktree for the whole loop, shared across iterations to reduce branch churn. `sibling_repos` can opt related repos into the same isolation session for split frontend/backend or app/library work. Successful runs clean up; failed runs preserve evidence. Prefix-match cleanup never touches Cursor-managed worktrees in the same directory.

- **7 security gates.** `secrets` (regex + entropy, gitleaks-style), `permissions` (`settings.json` over-grant detection), `hook injection` (`hooks.json` AST scan for `rm -rf`, `curl | sh`, `eval`), `dependency CVEs` (OSV.dev), `hallucination` (AST scan for non-existent imports — pure-filesystem check, no execution of LLM-generated code), `prod-name guard` (cross-tool sequence detection for production-targeting patterns), `prompt injection` (hidden-instruction pattern detection + privilege separation, regex + LLM second pass). Findings go to `.claude/observability/security/findings-*.jsonl` — never transmitted.

- **Privilege separation.** Reviewer agents get `permissions.deny: [Write(*), Edit(*), Bash(rm:*), Bash(curl:*), Bash(npm:*), Bash(eval *), Bash(python:*), Bash(node:*), Bash(sh:*), Bash(bash:*)]` — interpreter denies block subprocess-bypass attempts (0.6.2 hardening). Executor agents get `permissions.allow: [Write(.worktrees/**), Edit(.worktrees/**), Bash(uv run:*), Bash(pytest:*), …]` plus paired Edit/Write denies on system paths (`/etc`, `~/.ssh`, `~/.aws`). Combined with worktree isolation and the 0.7.1 telemetry tool-input whitelist + secret redaction (ADR-107), this gives defense-in-depth: even a prompt-injected reviewer cannot write to disk or shell out via interpreters; even a compromised executor cannot write outside the active worktree or touch system credentials; even a poisoned `tool_input` payload cannot leak credentials into the metrics log.

- **Brownfield-safe.** `Reconciler` indexes existing `.claude/`, computes hash-based ours/theirs decisions via provenance frontmatter, and offers per-conflict keep/replace/both. Apply is ADD-only with timestamped backups. User edits are never silently overwritten.

- **Mechanical pre-checks in review.** Add shell commands to `harness.yaml.reviewers.mechanical_checks` (e.g., `ruff check .`, `uv run pytest tests/unit -x -q`). These run **before** any LLM reviewer — stop-on-first. If a command fails, `/hm:review` emits `## MECHANICAL_BLOCK: <cmd> exit=<N>` and halts immediately: no grade, no auto-fix loop, no wasted reviewer tokens on broken code. The list is preserved across re-renders.

- **Deep interview before every implementation.** `/hm:spec` runs a 6-category interview (Intent → Outcomes → In-Scope Scenarios → Non-Goals → Constraints → Verification) scored for completeness; incomplete categories trigger follow-up questions, not silent gaps. `/hm:plan` runs a 9-category interview (scope → architecture → contract → risk → testing → phasing → dependencies → failure handling → observability) in impact order. Each decision that changes a component boundary, introduces a new contract, or rejects a viable alternative is promoted to a formal **Architecture Decision Record** (ADR) and becomes a binding constraint for `/hm:execute` — not advisory. A `plan-validator` agent gates the plan before it is written to disk: NEEDS_REVISION triggers targeted follow-up rounds; MAJOR_REVISION escalates to the user. The interview ends with a "no deferred decisions" scan — any "Accept?/OK?/Verify?" phrasing is a missed interview round, not a plan checkpoint.

- **Autoloop with adaptive interview and 4-gate convergence.** `/hm:loop` runs time-and-iteration-bounded loops. Before the first iteration, `autoloop-driver` reads the goal description, extracts already-answered dimensions (purpose / invariants / priority / stopping_criteria), and asks only what's missing — no fixed question script. It locks a loop intensity and explicit exit checklist, then requires mechanical checks, per-criterion LLM judgment, regression comparison, and a two-iteration convergence streak before accepting completion. A single worktree is shared across all iterations, and failed iterations preserve the worktree for inspection.

- **Refdocs search skill.** Register your project's reference folders (architecture docs, API specs, design docs) in `harness.yaml`. `/harness-maker:make` builds a local `docs_index.yaml`, and the `refdocs-search` skill gives the LLM lossless full-text search across registered folders — no chunking, no RAG index.

- **SessionStart drift reminder.** A hook fires on every session open and warns if the running harness-maker version differs from the version that rendered the harness — so you notice when a `/plugin update` needs a re-render. The detector compares `harness.yaml.harness_maker_version` against the **latest plugin version cached on disk** (not just the imported `__version__`), so `/hm:health` and SessionStart agree even when the slash command runs against a pinned older version (0.6.2).

- **Memory tier with cross-process safety.** `.claude/memory/` holds `episodic/` (per-day JSONL), `semantic.jsonl` (queryable index), `profile.json`, `wiki.md`, and `failures.md`. Concurrent writers from parallel sessions are serialised via a re-entrant POSIX flock — same thread can re-acquire without deadlock, different threads block normally (ADR-106, 0.7.1). Telemetry hooks append atomically via raw `os.write()` on `O_APPEND` (single-syscall, ≤ PIPE_BUF) so concurrent Claude Code + Cursor hooks cannot interleave JSONL lines.

- **3-tier context loading + compaction recovery.** Every stage opens memory in tier order — Hot (`session/<today>.md`), Warm (`failures.md` + `wiki.md` first 60/40 lines), Cold (git log / PLANs on demand). `PreCompact` hook flushes the session to `session/<today>.md` with a `checkpoint:compaction` marker before Claude compacts context; the next turn detects the marker and resumes from the last in-progress phase without losing progress.

- **2-pass redaction for review precision (+47 pp).** Reviewers run twice: Pass 1 strips PR title, author, and commit message so findings aren't anchored to metadata; Pass 2 restores full context and each reviewer validates or drops their Pass 1 findings. Findings absent from Pass 2 are dropped (CP10 contract). Ablation showed a +47 percentage-point precision gain on anchoring-prone diffs.

- **Self-improving failure memory.** Every stage appends failure patterns (wrong API usage, broken convention, unexpected build failure) to `failures.md` with a count. When any failure slug reaches count ≥ 3, a proposal is automatically appended to `pending-proposals.md` — suggesting a new skill, rule, or hook that would have prevented the recurrence. The user reviews and decides whether to ingest.

- **ADR system as binding execute constraints.** Architecture Decision Records promoted during `/hm:plan` are hard constraints — `/hm:execute` must not violate them. If a PLAN phase conflicts with an ADR, execute surfaces it as a blocker rather than silently proceeding. ADRs capture rejected alternatives and the reasoning, so future sessions don't re-litigate settled decisions.

- **Cache miss classification (4 reasons).** The prompt-cache diagnostic layer reports why a cache miss occurred: `min_threshold` (content too short), `invalidation` (context changed), `ttl` (5-min TTL expired), or `first` (cold start). The 5% weight in the AI-readiness score distinguishes cold-start misses (benign) from structural misses (actionable), so you fix the right thing.

- **Grade-based auto-fix loop.** `/hm:review` computes a grade (A–F) from `consensus-passed` P0/P1 findings and loops: apply fixes → re-review (selective — only re-spawn reviewers whose scope was touched) → regrade, until grade meets `grade_threshold` (default A) or `max_review_rounds` is exhausted. Failed fixes that break the build are automatically reverted and logged. Weak-consensus and manual-only findings are never auto-applied.

- **Pre-LLM mechanical checks gate.** Add shell commands to `harness.yaml` once and they run at the start of every `/hm:review` — before any reviewer agent spawns. Stop-on-first: the first non-zero exit emits `## MECHANICAL_BLOCK: <cmd> exit=<N>`, halts the review, and exits `CHANGES_REQUESTED`. Lint clean and tests green are enforced mechanically, not by reviewer prompt. `--no-auto-fix` does not skip mechanical checks.

- **Detection Depth** — 12+ stack/framework granularity (python/node/rust + java/kotlin/swift/dart/ruby/php/csharp/elixir/scala/c-cpp/zig/haskell), framework dep parsing, package_manager + ci_provider detection, manifest-mtime cache with 24h ceiling.

- **Foreign AI Config Migration** — detect 6 known foreign configs (`.cursor/rules/`, `AGENTS.md`, `CLAUDE.md`, `.continue/config.json`, `.aider.conf.yml`, `.github/copilot-instructions.md`); LLM-driven import; `@hm:harness:*` inverted markers preserve user content across re-renders.

- **Adaptive Personalization** — `/hm:health` Step 3 composite-score rubric (Bronze/Silver/Gold/Platinum) with evidence-bearing ActionItems; 100% local telemetry; SessionStart drift hint after 30 axis overrides or 14 days. (Absorbs the 0.12.x `/hm:personalization-audit` command; ADR-011 rubric unchanged.)

- **Confidence-Bucketed Recommendation UI** — every detection declares its own confidence (HIGH/MEDIUM/LOW); high → silent yaml comment, medium → explicit prompt, low → no surface. Backward-compat regression test guards 0.11.x users from surprise silent-default changes.

For the complete mechanics behind each feature — all procedures, decision paths, and internal invariants — see [**docs/HOW-IT-WORKS.md**](docs/HOW-IT-WORKS.md).

---

## How it works

```mermaid
flowchart TD
    A["/harness-maker:make"] --> B["Profile\n(stack, scale, lifecycle)"]
    B --> C["Interview\n(preset + 10 dims + targets)"]
    C --> D["Synthesize\n(deterministic Blueprint)"]
    D --> E["Render\n(Jinja2 + provenance frontmatter)"]
    E --> F{Brownfield?}
    F -- No --> G["Write .claude/ directly"]
    F -- Yes --> H["Reconcile\n(hash-based keep/replace/both)"]
    H --> G
    G --> I["Extra targets?\nRender .cursor/ and/or .codex/ assets"]
    I --> J["User runs /hm:* commands"]
    J --> K["Weekly /hm:health (Step 2)\n4-source anti-rot crawl\n→ manual confirm"]
```

**14 mechanisms** (M1-M14) back every feature. See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for the full breakdown including the privilege-separation model, security gate triggers, and reconcile invariants.

Since 0.12.0 the synthesis pipeline also threads through a typed `Recommendation` registry: every `recommend_<axis>(profile, project_dir)` function declares per-detection confidence (ADR-007), and the `interview.py` dispatcher routes by bucket. Detection results land in `~/.cache/harness-maker/profile-<repo-hash>.json` with manifest-mtime + 24h-ceiling invalidation. Foreign AI config files detected at `/hm:configure` time can be imported into `harness.yaml` and re-rendered single-source with `@hm:harness:*` inverted block markers (ADR-009).

---

## Slash commands the harness exposes

After install, the rendered harness exposes commands under `/hm:*`:

### Atomic stages (always available)

| Command | Purpose |
|---|---|
| `/hm:research` | Gather facts, user-workflow signals, best practices, and options |
| `/hm:spec` | Write acceptance criteria from research |
| `/hm:plan` | Decompose spec into phases with exit criteria |
| `/hm:execute` | Implement with TDD + worktree isolation |
| `/hm:review` | Multi-reviewer consensus (mechanical pre-check → conditional routing) |
| `/hm:wrapup` | Clean, document, commit |
| `/hm:verify` | 6-check gate before completion |

### Fused workflows (preset-generated, user-renameable)

Fused workflows combine atomic stages into a single command. The interview generates a starter set; you can add, remove, or rename them in `harness.yaml`.

| Preset | Default fused workflows |
|---|---|
| **Side** | `/hm:plan-exec-rev` · `/hm:exec-rev` · `/hm:exec-rev-wrap` _(default)_ |
| **Production** | `/hm:exec-rev-wrap-ver` _(default)_ · `/hm:exec-rev-wrap` · `/hm:plan-exec-rev` · `/hm:exec-rev` · `/hm:res-spec-plan` |

### Utility commands

| Command | Purpose |
|---|---|
| `/hm:loop "<goal>"` | Autoloop driver — `feature` or `improve` mode, time/iter-bounded |
| `/hm:ai-readiness` | 3-layer readiness score + P0/P1/P2 ranked actions |
| `/hm:personalization-audit` | Composite-score rubric (Bronze/Silver/Gold/Platinum) from telemetry + harness.yaml + ProjectProfile. Reads `.claude/observability/adaptive/overrides.jsonl`; outputs ranked ActionItem list with evidence schema. ADR-011 (v0); calibration deferred to 30+ project sample. |
| `/hm:refresh` | Anti-rot crawl — manual confirm required |

---

## How it compares

Other Claude Code harnesses pick a niche; harness-maker is the **meta-tool** that builds them — and then keeps them current.

| Project | Scope | What harness-maker adds |
|---|---|---|
| **ohmyclaudecode** | Curated commands/agents bundle | Project-tailored synthesis (preset + 10 override dims), brownfield reconcile, provenance frontmatter, anti-rot pipeline |
| **superpowers** | Powerful sub-agents and workflows | Single-command entry, AI-readiness scoring, worktree isolation by default, privilege-separated reviewer/executor |
| **Archon** | Knowledge-base + RAG-backed planning | Stack/scale/lifecycle profiler, atomic+fused workflow engine, conditional reviewer routing, 7 security gates |
| **aider** | Terminal pair programmer, LLM-agnostic | Claude Code / Cursor / Codex native assets; harness output is the runtime (not the session); anti-rot keeps it current |
| **ouroboros** | Autonomous self-bootstrapping AI software factory | Project-shaped interview (not one pipeline for all), grade-gated reviews with mechanical pre-checks, anti-rot pipeline, brownfield reconcile, multi-target support |
| **Hand-rolled `.claude/`** | Full control, zero automation | Drift detection via provenance hash, weekly anti-rot crawl, AI-readiness scoring, worktree isolation — without writing it yourself |

The core difference: harness-maker generates and **owns the lifecycle** of your `.claude/` directory. A static tool gives you a starting point. harness-maker gives you a starting point that knows who it was created for and can be updated without losing your changes.

---

## Configuration

The interview writes answers to `.claude/harness.yaml`. Key dimensions:

```yaml
preset: Production           # Side | Production
locale: en                   # en | ko | <any — unknown falls back to en>
dev_mode: spec-driven        # spec-driven | task-driven
targets:                     # which runtimes to drive
  - claude-code
  - cursor
  - codex
recommended_model: claude-opus-4-7

ref_folders:
  - path: ../architecture-docs
    glob: "**/*.{md,txt,pdf}"

second_brain:
  enabled: true
  backend: filesystem
  project_id: my-app
  vault_path: ../obsidian-vault
  trusted_allowlist: true       # configured write folders need no confirmation/backup
  required_frontmatter: [type, created, updated, tags, links]
  folders:
    - path: Projects/my-app
      read: true
      write: true
      note_types: [decision, preference, failure, project, reference, journal]

sibling_repos:
  - ../backend

reviewers:
  enabled: [code, security, performance, ux, concurrency]
  routing: conditional       # conditional | always-all
  mechanical_checks:         # pre-LLM stop-on-first gate (optional)
    - ruff check .
    - uv run pytest tests/unit -x -q

worktree:
  scope: [execute, plan]     # which stages run in a fresh worktree
  cleanup: on_success        # on_success | always | never

anti_rot:
  enabled: true
  threshold: 0.7             # adaptive — adjusts ±0.05 based on accept/reject ratio

context_lint:
  strict: true               # warn on overrun | block

memory:
  files: [failures.md, wiki.md]

adaptive:
  disable_telemetry: false   # opt-out per ADR-005; 100% local capture
  audit_session_threshold: 30  # SessionStart hint after N axis overrides
  audit_days_threshold: 14     # SessionStart hint after N days without audit
```

All adaptive features are 100% local. `tests/unit/test_no_network.py` asserts no socket call during telemetry emit, audit, or SessionStart hook execution (ADR-005 positive obligation).

Run `/harness-maker:make` again and choose **Update** (same settings, pick up template improvements) or **Full reconfigure** to change any dimension.

<a id="second-brain-setup"></a>
### Obsidian Second Brain

`second_brain` connects a Markdown/Obsidian vault as a typed knowledge graph for
stage-aware memory. `hm-research`, `hm-plan`, `hm-review`, and `hm-wrapup` use
different note types (`reference`, `project`, `decision`, `preference`,
`failure`, `journal`) instead of loading the whole vault. Writes are full
Markdown writes inside configured `write: true` folders only. Configure narrow
folders: the allowlist is trusted completely, with no per-write confirmation or
backup. For multiple projects sharing one vault, give every project a distinct
`project_id` and put writable folders under a path segment with the same value
(for example `Projects/my-app`). harness-maker rejects writable Second Brain
folders that do not include the configured `project_id`, which keeps one
project from writing into another project's note namespace.

**Setup walkthrough**

1. **At interview time** — when prompted for `vault_path`, supply the absolute
   path to your Obsidian vault root (or a not-yet-created subfolder of it; the
   harness creates the subfolder on first write iff the parent has
   `.obsidian/`). The interview then asks for `project_id` and a writable
   folder, defaulting to `99_HM/{project_id}/` (matches the `99_*/01_*`
   numeric-prefix organization style).
2. **Post-install adjustment** — run `/hm:configure` to revisit Second Brain
   settings. The slash command dispatches to
   `harness-maker configure-second-brain --check`, which inspects state and
   returns guidance JSON so the slash command can prompt only when something
   is missing. To add a folder non-interactively, call
   `harness-maker configure-second-brain --add-folder 99_HM/my-project/`.
3. **Existing harnesses upgraded from a pre-fix release** — the loader now
   tolerates `folders: []` (logs a one-shot warning + remediation hint
   pointing at `/hm:configure`); writes raise `SecondBrainError` with the
   same hint instead of an obscure "not under a configured write folder"
   message.

Internals: the rendered `harness.yaml` carries a provenance frontmatter block;
all readers route through `harness_maker.io_utils.load_harness_yaml` (the
canonical multi-document-tolerant loader) to avoid the parser-strategy drift
that produced the original bug.

---

## Targets

`targets` is a multi-select. Choose `claude-code`, `cursor`, `codex`, or any combination. Claude Code is the default; Cursor and Codex add runtime-specific assets while preserving the same preset, workflows, skills, reviewers, and safety model.

### Cursor target

Run `/harness-maker:make` and pick `targets: [cursor]` or `[claude-code, cursor]` at the interview. The renderer adds:

- `.cursor/rules/harness.mdc` — always-on workflow rules with Cursor-legal frontmatter (`description` / `globs: []` / `alwaysApply: true`)
- `.cursor/hooks.json` — Cursor-native hooks schema (lowercase camelCase keys, `version: 1`, flat `{matcher, command}`). **Deliberately different from `.claude/hooks/hooks.json`** (PascalCase, nested `{hooks:[…], matcher}`); each IDE reads only its own file. Don't try to collapse them — Cursor will silently stop firing hooks. See `tests/cursor-compat/results-2026-05-08.md` for the kairos 0.5.7 forensic that proved this empirically.
- `.cursor/mcp.json` — Cursor MCP server config. Populated from `harness.yaml.mcp_servers` (0.6.2+); defaults to `{"mcpServers": {}}` when no servers are configured. Inner shape (`command`, `args`, `env`) is type-validated on parse with a warning log when entries are dropped.

`.claude/agents/`, `.claude/skills/`, and `.claude/commands/hm/` are single-source — Cursor 2.4+ reads them natively (forensic-verified). Hooks are the only asset that requires per-IDE rendering because the schemas diverge by design.

### Recommended model

`harness.yaml.recommended_model` defaults to `claude-opus-4-7` and propagates to agent frontmatter. Cursor users may override model selection in their IDE. The harness does **not** rewrite prompts to be model-agnostic — `<thinking>` blocks and Claude-specific patterns are preserved deliberately.

### Verification

Per-release Cursor compatibility is tracked in `tests/cursor-compat/`:

- `MANUAL_CHECKLIST.md` — A1–A4 (agent dispatch, hook fire, skill auto-discovery, slash command + Q&A loop) covering both IDEs
- `RESULTS.md` — PASS/FAIL/PARTIAL grid you fill while running the checklist
- `results-2026-05-08.md` — kairos 0.5.7 forensic that resolved Q-A (hooks discovery) and Q-B (commands discovery) without an IDE-driven manual run; future Cursor verifications append a new dated `results-*.md`
- `fixture/` — minimal `.claude/` for opening directly in either IDE

Automated CI guards the dual-schema invariants regardless of manual fixture runs:

- `test_cursor_hooks_uses_lowercase_native_schema` — fails if `.cursor/hooks.json` accidentally adopts Claude PascalCase
- `test_no_cursor_commands_rendered` — fails if a future change starts emitting `.cursor/commands/hm-*.md` mirrors (Cursor reads `.claude/commands/` natively)
- `test_render_agents_have_structured_permissions_frontmatter` — fails if any agent template loses its `permissions.allow/deny` block (Cursor 2.5+ subagent permission inheritance gap)

### Codex target

Run `/harness-maker:make` and pick `targets: [codex]` or include `codex` with the other targets. The renderer adds:

- `AGENTS.md` — Codex's top-level project instruction file, rendered without YAML frontmatter so Codex displays clean instructions.
- `.codex/config.toml` and `.codex/agents/*.toml` — Codex-native configuration and agent registrations.
- `.codex/hooks.json` — Codex hook wiring, including Codex-specific permission events and file-edit tool matchers.
- `.agents/skills/*/SKILL.md` — the existing harness skills plus stage, workflow, and loop trigger skills in the layout Codex discovers.

Codex TOML files are rendered as pure TOML, not markdown-with-frontmatter. `AGENTS.md` uses block-merge markers so user additions survive re-renders, while `.codex/*.toml` files are regenerated from the selected target configuration.

**Structured questions in Codex:** Stage and loop skills instruct the agent to use the `request_user_input` tool for interview questions. This tool is available in Plan mode by default. To enable it in Code mode, add `default_mode_request_user_input = true` under `[features]` in `.codex/config.toml`. If the tool is unavailable at runtime, the agent falls back to asking in its response text.

---

## Reconcile rules (re-rendering an existing harness)

Re-running `/harness-maker:make` on an existing harness uses a hash-driven KEEP rule: if a file's `content_hash` frontmatter matches the new template's hash, it's "ours" — safe to overwrite. If it differs (user-edited), it's "theirs" — kept.

**Trade-off**: when harness-maker bumps a template on a minor release, the existing file's hash no longer matches, so reconcile picks KEEP even if you didn't edit the file.

To pick up template updates after a version bump:

```bash
rm .claude/harness.yaml
/harness-maker:make
# (previous .claude/ is auto-backed up to .backup-<ISO>/)
```

`.cursor/rules/*.mdc` follow the same KEEP behavior. A future phase introduces a sidecar `.hm-meta.yaml` so harness-maker can hash-track Cursor assets without polluting Cursor frontmatter.

**`@hm:harness:*` inverted markers (0.12.0)** — for foreign-AI-config files we generate post-import (`.cursor/rules/*.mdc`, `AGENTS.md`, `CLAUDE.md`, etc.), content INSIDE `@hm:harness:<id>` markers is harness-owned (replaced on every render); content OUTSIDE is user-owned (byte-for-byte preserved). The block-merge parser dispatches per file extension: HTML comments for `.md`/`.mdc`, `# @hm:harness:` for `.yml`/`.yaml`, top-level `_hm_harness` JSON key for `.json`. 0.11.x files (frontmatter `generated_by: harness-maker` + zero `@hm:harness:*` markers) are treated as wholly harness-owned on first encounter post-upgrade and re-rendered into the new marker family (ADR-009 amendment).

---

## Observability

All observability is 100% local — nothing is transmitted externally.

| File | Contents |
|---|---|
| `.claude/observability/dashboard.md` | AI-readiness score, dimension breakdown, ranked action items |
| `.claude/observability/metrics-YYYY-MM-DD.jsonl` | Per-turn telemetry (cache hit %, tool calls, durations) — date-rotated daily (ADR-103, 0.7.1). Pre-0.7.1 `metrics.jsonl` is read as the trailing legacy shard. |
| `.claude/observability/refresh/raw-*.jsonl` | Anti-rot crawl evidence (accepted / rejected items) |
| `.claude/observability/security/findings-*.jsonl` | 7-gate security scan findings |
| `.claude/observability/adaptive/overrides.jsonl` | `harness_yaml_override` events with `schema_version: 1`, dual capture sites (`/hm:configure` exit primary + SessionStart secondary), dedup-keyed |
| `.claude/observability/adaptive/last-audit.txt` | Last `/hm:personalization-audit` run timestamp |

Run `/hm:ai-readiness` to regenerate the dashboard on demand.

---

## Marketplace

Marketplace manifests are present for each runtime:

- `.claude-plugin/plugin.json` — Claude Code plugin spec
- `.cursor-plugin/plugin.json` — Cursor Marketplace spec
- `.codex-plugin/plugin.json` — Codex plugin spec

Listings are **pending**. Until then, install locally:

```bash
# Any IDE — CLI install (recommended)
uv tool install harness-maker          # from PyPI (when published)
uv tool install /path/to/harness-maker # pre-PyPI, from clone
harness-maker make . --targets claude-code,cursor,codex

# Claude Code — plugin mode (optional, enables /harness-maker:make command)
claude --plugin-dir /path/to/harness-maker

# Cursor — open the repo folder directly in Cursor as a workspace plugin

# Codex — render Codex-native assets in your project harness
harness-maker make . --targets codex
```

---

## FAQ

**Q: Why Python? My project is Rust / Node / Go.**
The hooks (`permission_gate`, `worktree_gate`, `telemetry`) call `uv run python -m harness_maker.*` at PreToolUse / PostToolUse boundaries. This doesn't touch your project's toolchain — `uv` and `harness_maker` need to be on the path, but only to run hooks. Your project's build system is untouched.

**Q: Why does it require `uv`?**
`uv` gives a hermetic, fast Python environment without polluting the system or your project's virtualenv. Hooks run in milliseconds without activating anything.

**Q: Will harness-maker overwrite my hand-edits when I re-render?**
No. Every generated file carries a `content_hash` in its provenance frontmatter. Re-render compares the new template's hash against the file on disk. If they differ — meaning you edited the file — it keeps yours. See [Reconcile rules](#reconcile-rules-re-rendering-an-existing-harness).

**Q: What's the difference between `Side` and `Production`?**
`Side` is lean: 1 reviewer (code), verify-before-completion optional, worktree scope `[execute]`. `Production` is thorough: 5 reviewers, verify required, worktree scope `[execute, plan]`, security on high-finding = block. Both share the same anti-rot and caching defaults.

**Q: Does anti-rot ever auto-apply?**
Never. Every anti-rot item surfaces via a structured question (`AskQuestion` in Cursor, `AskUserQuestion` in Claude Code) in `/hm:refresh`. There is no `--auto-apply` flag and no plan to add one. The rationale: a wrong patch is worse than a stale harness.

**Q: Can I use only Claude Code? Only Cursor? Only Codex?**
Yes. `targets` is a multi-select at the interview. Claude Code uses `.claude/`; Cursor reuses most `.claude/` assets and adds `.cursor/`; Codex adds `AGENTS.md`, `.codex/`, and `.agents/skills/`.

**Q: Do my prompts or telemetry leave my machine?**
No. `metrics.jsonl`, dashboard, and security findings are written to `.claude/observability/` locally. Anti-rot crawls *read* public sources (Anthropic blog, arxiv, GitHub, OSV.dev) but never uploads anything.

**Q: How do I pick up template improvements after a `/plugin update`?**
Run `/harness-maker:make` → choose **Update**. For files where your hash matches the old template, the new version is applied. For files you edited (hash mismatch), yours is kept. To force a full refresh, `rm .claude/harness.yaml` and re-run.

**Q: What are `mechanical_checks` and when should I use them?**
Shell commands listed under `reviewers.mechanical_checks` in `harness.yaml` run at the start of every `/hm:review` — before any LLM reviewer spawns. The first command that exits non-zero emits `## MECHANICAL_BLOCK: <cmd> exit=<N>` and halts review immediately (`CHANGES_REQUESTED`). Use them for fast, deterministic gates (lint, type-check, unit test) that shouldn't waste LLM tokens when the basics are broken. The list is user-managed; harness-maker never populates it automatically. `--no-auto-fix` does not skip mechanical checks — they are a hard gate, not part of the fix loop.

**Q: Why doesn't harness-maker rewrite prompts to be model-agnostic?**
The prompts are tuned for `claude-opus-4-7` — `<thinking>` blocks, role framing, chain-of-thought structure. Rewriting for model-neutrality would degrade quality on the recommended model for hypothetical gains on others. Override `recommended_model` in `harness.yaml` if you want a different model; the prompts remain as-is.

---

## Personalization Architecture

harness-maker 0.12.0 introduces three tracks of personalization depth:

- **Detection Depth (Track A)** — `harness_maker.profile.profile()`
  detects stack (12+ languages), framework, package manager, and CI
  provider. Results land in `ProjectProfile` and feed the recommendation
  framework. Cache at `~/.cache/harness-maker/profile-<repo-hash>.json`
  (manifest-mtime invalidation + 24h TTL).

- **Foreign AI Config Migration (Track D)** — `/hm:configure` detects
  `.cursor/rules/`, `AGENTS.md`, `CLAUDE.md`, `.continue/config.json`,
  `.aider.conf.yml`, and `.github/copilot-instructions.md`. With user
  confirmation, harness-maker imports the foreign config's intent into
  `harness.yaml` and *re-generates the foreign file as part of the
  harness on every render*. The `@hm:harness:*` block markers protect
  user customizations outside the marked regions.

  **Cursor power-user constraint (ADR-003)** — single-source means
  harness re-generates `.cursor/rules/` on every render. If you prefer
  Cursor-only ownership of those rules, the only opt-out today is to
  drop the `cursor` target from `harness.yaml.targets`. A dedicated
  opt-out flag is TODO for a future PLAN.

- **Adaptive (Track B start)** —
  `harness.yaml.adaptive.disable_telemetry: false` (opt-out per
  ADR-005) enables override telemetry. `/hm:personalization-audit`
  scores your harness composite (0–100; Bronze < 40 < Silver < 65 <
  Gold < 85 ≤ Platinum) and surfaces ranked action items with
  evidence. SessionStart drift hint fires after 30 overrides or 14
  days without an audit.

All adaptive features run **100% locally** — no network calls
(asserted by `tests/unit/test_no_network.py`).

0.12.1 patch extended `CACHED_MANIFESTS` with literal filenames from `STACK_GLOB_MANIFESTS` (`stack.yaml`, `package.yaml`) so Haskell projects now properly invalidate the profile cache on those manifests' mtime bump.

---

## Roadmap

**Done (0.12.0–0.12.1):**
- Track A (Detection Depth): 12+ stacks/frameworks/pkg-mgr/CI
- Track D (Foreign AI Config Migration): 6 config types, single-source re-render, `@hm:harness:*` markers, 0.11.x migration
- Track B-start (Adaptive): override telemetry, `/hm:personalization-audit`, SessionStart drift surface

**Next (0.13.0 — Track B completion + Cursor opt-out):**
- Track B-extra: B2 permission-frequency capture + B3 reviewer-signal aggregation
- `harness.yaml.cursor.opt_out_render` flag for Cursor power-users (ADR-003 documented constraint)
- `github/spec-kit` external e2e fixture vendoring (currently pytest.skip with TODO)

**Future (0.14.0+ — Track C strategic expansion, ranked by user value):**
- C2 privacy/regulated tier (HIPAA/PCI/GDPR — enterprise entry barrier)
- C1 team-profile axis (solo vs small-team vs large-team)
- C3 code-style detection (docstring conventions, naming patterns)
- C4 cross-project user-defaults
- C5 per-developer overlay in shared harness

**Deferred-by-data:**
- Rubric v0 calibration after 30+ projects accumulate `/hm:personalization-audit` runs (passive trigger)

**Standing items:**
- PyPI publish — remove the editable-from-clone requirement.
- Claude Code + Cursor + Codex Marketplace listings — submit all plugin manifests.
- `.hm-meta.yaml` sidecar for Cursor assets — enable hash-tracking of `.cursor/rules/*.mdc` without polluting Cursor frontmatter.
- User-configurable anti-rot repo list — `harness.yaml.anti_rot.github_repos` to track additional Claude Code ecosystem repos beyond the default.
- Demo screencast — record a first-install + `/hm:loop` session.
- `Enterprise` preset — stricter security gates, mandatory spec-driven dev mode.

---

## Development

```bash
uv sync
uv run pytest                       # full suite
uv run ruff check src/ tests/       # lint
uv run ruff format src/ tests/      # format
uv run mypy --strict src/           # type check
bash .claude-verify.sh all          # phase-by-phase exit criteria + final acceptance
```

---

## Contributing

See [`docs/CONTRIBUTING.md`](docs/CONTRIBUTING.md) for adding skills/agents/presets, test patterns, and the PR checklist (including the 4-file version bump invariant).

See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for the 14 mechanisms (M1-M14) behind the system.

---

## License

MIT — see [LICENSE](LICENSE).
