Metadata-Version: 2.4
Name: harnessforge
Version: 0.2.2
Summary: Universal harness layer for AI coding agents — one command sets up your repo for any agent (Claude Code, Cursor, Codex, Gemini CLI, Aider, OpenHarness).
Project-URL: Homepage, https://github.com/jcaiagent7143-ui/harnessforge
Project-URL: Documentation, https://jcaiagent7143-ui.github.io/harnessforge
Project-URL: Repository, https://github.com/jcaiagent7143-ui/harnessforge
Project-URL: Issues, https://github.com/jcaiagent7143-ui/harnessforge/issues
Project-URL: Changelog, https://github.com/jcaiagent7143-ui/harnessforge/blob/main/CHANGELOG.md
Author: The harnessforge Contributors
License: MIT
License-File: LICENSE
Keywords: agent,agent-blueprint,agents-md,ai,aider,claude,claude-code,codex,cursor,gemini-cli,harness,llm,mcp,openai,openharness,rag,skills
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: anyio>=4.3
Requires-Dist: httpx>=0.27
Requires-Dist: jinja2>=3.1
Requires-Dist: pydantic>=2.6
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.7
Requires-Dist: typer>=0.12
Provides-Extra: all
Requires-Dist: anthropic>=0.40; extra == 'all'
Requires-Dist: fastapi>=0.110; extra == 'all'
Requires-Dist: google-genai>=0.3; extra == 'all'
Requires-Dist: litellm>=1.40; extra == 'all'
Requires-Dist: mcp>=1.0; extra == 'all'
Requires-Dist: numpy>=1.26; extra == 'all'
Requires-Dist: ollama>=0.3; extra == 'all'
Requires-Dist: openai>=1.40; extra == 'all'
Requires-Dist: textual>=0.60; extra == 'all'
Requires-Dist: uvicorn[standard]>=0.29; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0; extra == 'dev'
Requires-Dist: vcrpy>=6.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs>=1.6; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'docs'
Provides-Extra: embeddings
Requires-Dist: numpy>=1.26; extra == 'embeddings'
Provides-Extra: gemini
Requires-Dist: google-genai>=0.3; extra == 'gemini'
Provides-Extra: litellm
Requires-Dist: litellm>=1.40; extra == 'litellm'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == 'mcp'
Provides-Extra: ollama
Requires-Dist: ollama>=0.3; extra == 'ollama'
Provides-Extra: openai
Requires-Dist: openai>=1.40; extra == 'openai'
Provides-Extra: proxy
Requires-Dist: fastapi>=0.110; extra == 'proxy'
Requires-Dist: uvicorn[standard]>=0.29; extra == 'proxy'
Provides-Extra: tui
Requires-Dist: textual>=0.60; extra == 'tui'
Provides-Extra: web
Requires-Dist: fastapi>=0.110; extra == 'web'
Requires-Dist: uvicorn[standard]>=0.29; extra == 'web'
Description-Content-Type: text/markdown

<div align="center">

# harnessforge

### One command turns any repo into a project where Claude Code, Cursor, Codex, Gemini CLI, Aider, and any other coding agent show up already knowing the codebase.

```bash
uvx harnessforge init
```

[![PyPI](https://img.shields.io/pypi/v/harnessforge?label=pypi)](https://pypi.org/project/harnessforge/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![CI](https://github.com/jcaiagent7143-ui/harnessforge/actions/workflows/test.yml/badge.svg)](https://github.com/jcaiagent7143-ui/harnessforge/actions/workflows/test.yml)
[![Tests](https://img.shields.io/badge/tests-186%20passing-brightgreen)](https://github.com/jcaiagent7143-ui/harnessforge/actions/workflows/test.yml)

</div>

---

## Why this exists

> **2026 is the year developers still build the harness.**
> **2027 is the year the LLM builds its own harness.**

Today, every agent system still needs humans to manually prepare the
environment: MCP servers, repo instructions, memory files, test commands,
validation scripts, permission rules, browser credentials, and the
task-loop scaffolding that holds the whole thing together. OpenAI's own
definition of an agent — a system that plans, calls tools, collaborates,
and keeps state across multi-step work — depends on every one of those
layers being in place *before* the first plan step runs. MCP has emerged
as the common connection layer for tools, data, and workflows. But
*which* tools to connect, *what's* forbidden, *what counts as done* —
those still get hand-authored, once per project, and then re-authored
for the next.

In 2026, developers still spend too much time on this setup: MCP tools,
repo rules, test commands, memory files, browser validation, credentials,
workflow loops. By 2027, I don't think they will. The LLM will inspect
the project, understand the task, generate the right harness, connect
the right tools, create its own memory, write its own validation scripts,
and keep refining the loop until the task is done. The harness layer
disappears as a separately-authored artifact.

**The next big open-source project won't be another coding agent. It
will be the universal harness layer that every coding agent can use** —
one simple framework that lets Claude, OpenAI, Gemini, local models, and
future agents download a project, understand its environment, and call
tools safely through a common interface. Model-neutral by design, because
the model is the part that keeps changing.

**`harnessforge` is the 2026 bridge.** A deterministic repo walker plus an
opinionated blueprint set. In ~3 seconds, fully local with no network
calls, it generates everything your coding agent needs to start fast —
`AGENTS.md`, `SOUL.md`, `TOOLS.md`, `MEMORY.md`, `SKILLS/`, per-IDE
adapter files, blueprint validators, MCP recommendations, forbidden-path
rules. **Your coding agent stays the brain** — harnessforge just lays the
ground truth it reads on startup. You commit the output once per repo and
every coding agent you use shows up already knowing the codebase. When
the next generation of models can build this layer on the fly themselves,
harnessforge has done its job and ages out gracefully.

---

## What you get

Run `harness init` in your repo. ~3 seconds later, you have:

```
your-repo/
├── AGENTS.md           ← every coding agent reads this (OpenAI Codex CLI convention)
├── SOUL.md             ← personality / tone for this project
├── TOOLS.md            ← which tools / MCPs to use
├── MEMORY.md           ← memory schemas
├── SKILLS/             ← anthropics/skills-compatible procedures
│   ├── chunk-and-embed/SKILL.md
│   ├── retrieve-and-rerank/SKILL.md
│   └── …
├── .claude/CLAUDE.md       ← Claude Code reads this automatically
├── .cursor/rules           ← Cursor reads this automatically
├── .continue/config.json   ← Continue reads this automatically
├── .windsurf/rules         ← Windsurf reads this automatically
├── harness.config.json     ← what blueprint this repo is bound to
└── .harness/
    ├── profile.yaml        ← the canonical machine-readable description
    ├── manifest.json       ← sha256 of every file for safe re-runs
    └── memory_schemas/     ← JSON Schemas the blueprint expects
```

These aren't placeholder stubs. Here's the first 25 lines of an `AGENTS.md` `harnessforge init` produced for a tiny stock-analysis repo:

```markdown
# AGENTS.md

> _Generated by harnessforge v0.2.1 · blueprint `finance-agent` v1.0.0._

You are a **finance / market-data analyst agent** working in **portfoliowatch**.

This project is **read-only by default.** You fetch market data, compute
signals, surface insights. You **never** place orders, move money, or
modify positions without an **explicit per-action human-approval gate**
that the user typed "yes" through in this session.

---

## The analyst loop

```
fetch → compute → screen → flag
```

- `SKILLS/fetch-market-data` — get prices/quotes/fundamentals; respect rate limits.
- `SKILLS/compute-technicals` — RSI, SMA, MACD, etc. Vectorized; tested against canonical references.
- `SKILLS/screen-positions` — filter the universe by your declared criteria.
- `SKILLS/flag-attention` — surface what changed and why — calibrated, not alarmist.
```

A Claude Code session opened in that repo reads it and knows the loop, the
safety contract, and which skills to invoke — without you typing any of it
into the chat.

---

## Install

```bash
# No install, run once:
uvx harnessforge init

# Or install globally:
pipx install harnessforge
harness init

# Or pip into a venv:
pip install harnessforge
```

Init is **fully deterministic by default** — no LLM call, no network, no
API key, ~2 seconds. The optional MCP-server install lets your coding
agent call `harnessforge verify` and `harnessforge inspect` as typed
tools:

```bash
pip install "harnessforge[mcp]"   # expose harnessforge itself as an MCP server
```

> Older releases also shipped `[anthropic]` / `[openai]` / `[gemini]` extras
> that called an LLM during `init` to "refine" the profile. They still work,
> but we now recommend against them: if you want LLM-assisted refinement,
> let your coding agent (Claude Code, Cursor, Codex, Gemini CLI, Aider) do
> it after `init` — it has the full repo context and a chat loop, both of
> which a one-shot init-time call doesn't. These extras are scheduled for
> deprecation in 0.3.

---

## Five blueprints in 0.2.x

Pick with `--blueprint`, or let the recommender choose based on inspection
(yfinance deps → `finance-agent`; langchain/qdrant → `rag-agent`; airflow
→ `workflow-agent`; generic Python → `python-cli-app`).

| Blueprint | For | Skills it ships |
|---|---|---|
| **`python-cli-app`** | Build a Python CLI / library / web API — the default for greenfield Python work | `add-cli-command`, `add-unit-test`, `manage-dependency`, `check-style` |
| **`finance-agent`** | Market data + portfolio analysis | `fetch-market-data`, `compute-technicals`, `screen-positions`, `flag-attention` + `no_trades_without_gate` validator that fails if generated code calls a broker function without an approval check |
| **`rag-agent`** | Retrieval-augmented Q&A with citation enforcement | `chunk-and-embed`, `retrieve-and-rerank`, `answer-with-citations`, `eval-recall-precision` + citation cross-checker |
| **`support-agent`** | Customer support: intent → KB → ticket → escalate | `classify-intent`, `retrieve-kb-answer`, `file-ticket`, `escalate-if-unresolved` + ticket-lineage validator |
| **`workflow-agent`** | Multi-step orchestration (Zapier/n8n-style) | `decompose-task`, `call-tool-with-retry`, `check-result` + tool-log + idempotency validators |

Beyond the catalog you can author project-specific skills:

```bash
harness skills add fetch-portfolio-prices \
  --domain --description "Fetch live prices for tickers in positions.json from Polygon."
```

Domain skills land under `SKILLS/domain/<name>/SKILL.md` and surface in
`harness skills list` alongside blueprint-shipped skills.

---

## Validation that's actually a contract, not a vibe

`harness verify` runs blueprint-defined checks and emits a stable JSON
contract — the same contract whether you call the CLI or the MCP tool. Your
coding agent reads the JSON and self-corrects:

```bash
$ harness verify --json
{
  "schema_version": 1,
  "blueprint": "finance-agent",
  "checks": [
    {"name": "structure", "status": "pass", "duration_ms": 1, "messages": []},
    {"name": "tests",     "status": "pass", "duration_ms": 42, "messages": []},
    {"name": "no_trades_without_gate", "status": "pass", "duration_ms": 8, "messages": []}
  ],
  "summary": {"total": 3, "passed": 3, "failed": 0}
}
```

Exit codes: `0` all pass · `1` failures · `2` config error · `3` not a harness-bootstrapped repo. Drop in CI:

```yaml
- run: pip install harnessforge
- run: harness sync --check    # fail if generated files drifted from manifest
- run: harness verify --json   # fail if blueprint contract broken
```

The `no_trades_without_gate` validator on `finance-agent` is the spicy one:
it static-scans the repo for `place_order(`, `buy(`, `sell(`, etc., and
fails the build if any of them sit behind only a config flag instead of a
runtime user-approval gate. See `docs/concepts/trust-model.md` for the full
trust boundary (`profile.test_command` and `lint_command` are executed as
shell — same model as `make test` / `npm test`).

---

## Commands

```
harness init [PATH]                       # inspect → profile → blueprint → render
harness sync [PATH] [--check]             # re-render adapters; --check = drift detect for CI
harness inspect [PATH] [--json|yaml]      # deterministic InspectionReport
harness verify [TARGET] [--json] [--tests|--lint]
harness blueprint {list, show, apply}
harness skills {list, show, add [--domain]}
harness doctor                            # diagnose env: provider keys, extras, repo health
harness mcp                               # run stdio MCP server (5 typed tools)
harness version
```

---

## Use with your existing coding agent

| Agent | What it reads | Setup |
|---|---|---|
| Claude Code | `.claude/CLAUDE.md` | nothing — automatic |
| Cursor | `.cursor/rules` | nothing |
| Codex CLI | `AGENTS.md` | nothing |
| Continue | `.continue/config.json` | nothing |
| Windsurf | `.windsurf/rules` | nothing |
| Gemini CLI / Aider | `AGENTS.md` | nothing |
| Anything that speaks MCP | `harness mcp` exposes 5 typed tools | add to client config |

For the MCP path:

```json
{
  "mcpServers": {
    "harness": {
      "command": "uvx",
      "args": ["--from", "harnessforge[mcp]", "harness", "mcp"]
    }
  }
}
```

Exposes `harness_inspect`, `harness_blueprint_list`, `harness_skills_list`,
`harness_verify`, `harness_profile_read` as typed tools your agent can call.

---

## Hero demos (reproducible from this repo)

Three end-to-end demos against real public repos, pinned to specific SHAs:

| Demo | Repo | Blueprint | Run |
|---|---|---|---|
| FastAPI + RAG | [fastapi/full-stack-fastapi-template](https://github.com/fastapi/full-stack-fastapi-template) | `rag-agent` | [`examples/hero/fastapi_rag/run.sh`](examples/hero/fastapi_rag/run.sh) |
| Zulip + Support | [zulip/zulip](https://github.com/zulip/zulip) | `support-agent` | [`examples/hero/zulip_support/run.sh`](examples/hero/zulip_support/run.sh) |
| Airflow + Workflow | [apache/airflow](https://github.com/apache/airflow) | `workflow-agent` | [`examples/hero/airflow_workflow/run.sh`](examples/hero/airflow_workflow/run.sh) |

Each `run.sh` shallow-clones the upstream repo at the pinned SHA, runs
`harnessforge init`, and runs `harnessforge verify` to confirm everything passes. CI runs all three on every push.

---

## How this compares to alternatives

The "agent infrastructure" space has runtimes (Hermes, OpenClaw,
OpenHarness), SDKs (OpenAI Agents SDK, Mastra), and now provisioners
(harnessforge). The honest version of the comparison — including when
*not* to use harnessforge — lives in [`docs/concepts/vs-hermes-openclaw-openharness.md`](docs/concepts/vs-hermes-openclaw-openharness.md).

The shortest version: **if you write your own `CLAUDE.md` for every repo
you start, harnessforge replaces that file with one that's actually
project-specific, plus everything the other coding agents you use need
to read.** That's the comparison most readers are actually making.

---

## Quality bar

- **186 tests** across unit / golden-file / interop / integration tiers, all green on Python 3.11 / 3.12 / 3.13
- **83% line coverage** on `src/harness/`
- **mypy strict** + **ruff** clean
- **mkdocs --strict** builds clean
- **Fresh-venv install verified** — `pip install harnessforge && harness version` works on a clean machine (the v0.2.1 release was held until this passed)
- **Two rounds of real-agent A/B evaluation** — Claude Code building the same stock-analysis agent WITH vs. WITHOUT the harness; the diff drove the v0.2 + v0.2.1 designs. See [`CHANGELOG.md`](CHANGELOG.md) for the per-fix-per-eval breakdown.

---

## Status

`harnessforge` 0.2.1 — first public release. Following [Semantic Versioning](https://semver.org).

Feedback welcome via issues — see [CONTRIBUTING.md](CONTRIBUTING.md) and
[SECURITY.md](SECURITY.md). The 5 shipped blueprints are intentionally
opinionated; PRs proposing a 6th are encouraged. Sales / browser blueprints
land in 0.3 alongside auth-bearing MCP catalog entries.

## License

MIT — see [LICENSE](LICENSE).
