Steplog logo

Steplog

design brief · v0.9.0

Steplog is build governance for solo agentic builders. It records what your agent did, surfaces drift, and renders the result into a single dashboard. It does not tell you what to focus on, it does not recommend, and it does not replace your judgement. Capture and nudge — never brain.

The thesis

Solo builders working with coding agents have a coordination problem that traditional project tools don't solve. The agent moves fast; you move between tasks; the project's state lives partly in your head, partly in the codebase, partly in scattered chat scrollback. Things drift, and you find out three weeks later.

Steplog's bet: most of the value comes from a small, deliberate contract between the agent and a single canonical state file. Every meaningful unit of work — a build step, a design decision, a ship event — is recorded, with both a technical summary and a plain-language description. The dashboard renders from that file. The file is the truth; the dashboard is a view.

That's it. No queues, no prioritisation, no AI prioritising work for you. Just a clean record of what happened, surfaced in a way you can actually read.

The architecture

state.json as the canonical contract

One file at .steplog/state.json. Validates against a JSON Schema at the project's schema-version pin. Sections, activities, decisions, nudges, security inventory, drift allowlist — all there. The schema is the source-of-truth contract; every agent reads and writes to it.

Bindings as SDKs around the state file

Steplog ships with a CLI (steplog) and a Claude Code plugin. Both wrap the same state-file operations: initialize, log an activity, validate, render, archive, migrate. Codex and Cursor and Aider can adopt the same protocol from AGENT_PROTOCOL.md — there's nothing Claude-specific about the contract. Cross-agent portability is a structural commitment, not a posture.

The dashboard is a renderer

A single Python script reads state.json and emits BUILD_LOG.html. Every panel — the build map, the lifecycle pipeline, the activity timeline, the now panel, the nudges, the security inventory — is a pure function from state to HTML. The renderer doesn't connect to anything. No external services, no telemetry, no auto-recommendation engine. The output is a single static file you open in a browser.

The staged path

Steplog deliberately ships in stages. Each stage is a full product; the next stage layers on without breaking what came before.

Stage 1
where we are
Personal: one operator, one project. CLI + Claude Code plugin. Single state file, single dashboard. The version you're using right now.
Stage 1.5
Cross-project briefing. One operator, multiple Steplog projects. A roll-up view that surfaces which project needs attention without telling you what to do.
Stage 2
OSS release. Public marketplace listing for the Claude Code plugin. Every binding (CLI, plugin) is fully open and operator-customisable.
Stage 3
MCP server. Steplog as a tool any agent can call into directly, not just via slash command or CLI invocation. State-file operations exposed as discrete MCP tools.
Stage 4
Hosted SaaS with rules packs. For teams or individual operators who'd rather not self-host. Pluggable rules packs per industry / discipline. Still capture-and-nudge — never brain.

The three constraints

Three rules that the renderer, the schema, and every binding follow. They are not aspirations — they are the reason Steplog can be trusted.

Constraint A

Capture and nudge — never brain.

The dashboard may show what is odd, stale, or contradictory. The dashboard may not tell the operator what to focus on or what is most important. Words like "should", "recommended", "priority", "must" are explicitly banned in any computed surface. A build-time assertion in the renderer enforces this on the longest-stale callout; it'd be enforced anywhere prescriptive language could leak.

Constraint B

Cross-agent portable.

Every binding produces a valid state.json from any compliant input. The standalone generator renders a valid BUILD_LOG.html from any compliant state.json without binding affordances. No agent gets special treatment; no binding can extend the schema in agent-specific ways without a corresponding schema update.

Constraint C

Derivation transparency.

Every metric or rendered element computed from state.json (rather than read directly) is visually labelled as derived. The operator can always tell the difference between "Steplog wrote this down" and "Steplog computed this from what was written down." This is what makes the % complete number safe to show — not because the number is precise, but because its derived nature is visible.

The dogfood case

Steplog governs Steplog. The dashboard you build with Steplog is the same dashboard used to govern Steplog's own development.

The repo at github.com/datafrogger/steplog uses Steplog itself as its build-governance layer. Six committed feature packs have shipped under self-governance:

Every commit on main has been logged through the protocol. Every prod-marking decision was made via the same mechanism the dashboard surfaces. When a contradiction was caught — a section marked stable while a lifecycle stage said in-progress — it was surfaced through the same nudge engine the dashboard ships with.

The dashboard you build with Steplog is the same dashboard we governed Steplog's development with. That's the strongest argument we know how to make.

What Steplog is not

The boundary is what makes the tool trustworthy. Steplog does one thing — capture and nudge — and refuses to drift into the things it isn't.