Metadata-Version: 2.4
Name: continuity-mcp
Version: 0.1.0
Summary: Agent memory with receipts: an MCP server over an append-only, hash-chained ledger for task state, memory, and verified handoffs across sessions and models.
License-Expression: Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: mcp>=1.2
Provides-Extra: api
Requires-Dist: fastapi>=0.110; extra == "api"
Requires-Dist: uvicorn>=0.29; extra == "api"
Requires-Dist: pydantic>=2.6; extra == "api"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: fastapi>=0.110; extra == "dev"
Requires-Dist: uvicorn>=0.29; extra == "dev"
Requires-Dist: pydantic>=2.6; extra == "dev"
Dynamic: license-file

# Continuity

Continuity is the system of record for agent work: an agent continuity layer
that captures task state, memory, decisions, provenance, and handoff context
across models, sessions, and tools.

Current product premise: Continuity is not "better memory than a Markdown file"
for simple, sequential work. A strong, maintained `HANDOFF.md` matched
Continuity in the Stage A product-falsification benchmark. Continuity's sharper
claim is that it is a trust layer for AI work: it turns messy agent history
into resumable, inspectable, permissioned operational state, and can compile
that state into a compact human-readable handoff when a plain file is the
simplest interface.

The event log is the product. Task context, project memory, agent memory,
workflow state, gates, timeline rows, and the Console are projections of one
append-only, hash-chained ledger.

Every event can carry optional payload-level provenance and usage telemetry:
task/session identity, agent/model identity, parent and consumed ledger
sequence numbers, handoff source, token counts, usage source, and exact
microdollar cost. This metadata stays in the payload layer so the immutable
chain remains stable while provenance evolves.

The primary continuation path is structured context compilation: a task-chain
projection that returns current task state, ordered chronology, decisions,
unresolved conflict signals, and provenance chain. It is available through the
Python helper, FastAPI at `/projects/{project_id}/tasks/{task_id}/continuation`,
and MCP as `compile_task_context`.

Operational conflicts are first-class ledger events. When two agents produce
contradictory task assessments, Continuity records an `OPERATIONAL_CONFLICT`
that links the exact event sequences in disagreement. The conflict remains in
compiled context, FastAPI at `/projects/{project_id}/tasks/{task_id}/conflicts`,
and MCP until a human records an `OPERATIONAL_CONFLICT_RESOLVED` event.

## Product Falsification Checkpoint

The Phase 6 entry gate compared three continuation paths: no shared context, a
strong manually maintained `HANDOFF.md`, and Continuity via MCP. The 30-run
local matrix produced no-context `0/10`, strong `HANDOFF.md` `10/10`, and
Continuity `10/10`, so the result was a tie and durable runner work remains
blocked. See `docs/testing/2026-06-24-product-falsification-results.md`.

The next product-validation target is not another runner feature. It is a
Continuity-backed handoff workflow: use the ledger, memory, provenance,
validation, and conflict model to generate or verify a compact `HANDOFF.md`,
then test whether that beats a manual handoff under concurrency, stale-context
recovery, permissioned context, or audit requirements.

The follow-up IDEA-002 benchmark tested that narrower claim:

```bash
CONTINUITY_BENCHMARK_LOCAL_MODEL=qwen3.5:9b make trust-layer-handoff-benchmark
```

The completed 27-run local matrix produced no-context `0/9`, manual handoff
`6/9`, and Continuity-backed verified handoff `9/9` under the original scoring.
That scoring could not credit the manual arm in provenance-audit scenarios even
when it honestly reported `unverified`; scoring was revised on 2026-07-02 to
score provenance honesty equally and track ledger-backed verification as a
separate capability. The honest claim: a well-maintained manual handoff
preserves task facts; Continuity uniquely provides ledger-backed source
verification, which a manual handoff is structurally unable to provide. See
`docs/testing/2026-06-24-trust-layer-handoff-results.md`.

## Quick Start

As a user (installs the `continuity-mcp` MCP server command and
`continuity-handoff` receipt exporter):

```bash
pip install "git+https://github.com/machinedigital-ai/Continuity.git"
claude mcp add continuity --env CONTINUITY_DB="$HOME/continuity.db" -- continuity-mcp
```

As a developer (from a checkout):

```bash
make setup
make test
make serve
```

New here? [The tutorial](docs/TUTORIAL.md) walks through the full loop in ~15
minutes: connect an agent, record work, resume in a fresh session or a
different model, export the verified receipt, and read the receipt fields.

Then open the read-only Console:

```text
http://127.0.0.1:8000/console
```

## Common Commands

```bash
make setup   # create/update continuity-core/.venv and install requirements
make test    # run the test suite
make serve   # run FastAPI at http://127.0.0.1:8000
make mcp     # run the MCP server
make demo    # run the recursive proof demo
make proof   # export and verify the multi-model proof artifact
make ollama-chain # run the local multi-model Ollama chain proof
make codex-persistence-proof  # run two isolated Codex sessions through Continuity
make claude-persistence-proof # run two isolated Claude sessions through Continuity
make cross-agent-persistence-proof # run the Codex-to-Claude handoff proof
make trust-layer-handoff-benchmark # run the IDEA-002 verified handoff benchmark
make list-verified-handoff-tasks   # list exportable task IDs from a ledger
make export-verified-handoff       # export verified HANDOFF.md from a task ledger
```

## Proof Artifact

The completed Phase 4 Trust and Proof work exports a shareable Continuity proof
artifact from real ledger events:

```bash
make proof
```

The command writes `examples/multi_model_code_review.jsonl`, a permissioned
internal artifact showing a real multi-agent review sequence: external review,
Codex triage, operational conflict, human resolution, gate approval, selected
timeline, continuation context, provenance, and ledger integrity.

The artifact intentionally exports sanitized summary payloads. It retains
original ledger hashes and verifies exported chain-entry metadata, but it is
not yet a standalone public notary proof for omitted ledger events.

## Local Model Proofs

The local Ollama chain proof tests continuity across installed local models:

```bash
make ollama-chain
```

The dated finding and corrected targeted rerun are recorded in
`docs/testing/2026-06-19-local-ollama-chain-proof.md`. Across three corrected
runs, Continuity achieved 18/18 context-fidelity assertions, 9/9 provenance
assertions, 9/9 exact model outputs, and 3/3 valid ledgers using independent
Qwen, Gemma, and Ministral model families.

## Persistent Agent Proofs

The same-agent proof harness starts two fresh client processes connected only
through a dedicated Continuity SQLite ledger. It creates a random challenge
after Session A exits, gives Session B only stable project/task/agent IDs, then
verifies the output, linked validation, completion turn, ledger integrity, and
sanitized proof artifact from recorded events.

```bash
make codex-persistence-proof
make claude-persistence-proof
make cross-agent-persistence-proof
```

Codex runs ephemerally with native memories disabled. Claude runs without
session persistence and with only the explicit Continuity MCP configuration.
See [the persistent-agent proof runbook](docs/testing/persistent-agent-proof-runbook.md)
for exact boundaries, authentication checks, and troubleshooting. The
cross-agent mode stores its post-Codex challenge in shared project memory and
requires Claude's output and completion turn to carry Codex handoff provenance.

## Product Falsification Gate

Before Phase 6 runner work, Continuity is compared against no context and a
strong structured `HANDOFF.md`. Stage A runs 30 fresh local targets:

```bash
CONTINUITY_BENCHMARK_LOCAL_MODEL=qwen3.5:9b make product-falsification-stage-a
```

All Stage A arms use the same direct Ollama transport. The harness reads the
handoff or retrieves Continuity context through the real MCP stdio server
before the fresh call, isolating context quality from client tool-use behavior.

Results are written to `continuity-core/examples/product_falsification_results.jsonl`
and `docs/testing/2026-06-24-product-falsification-results.md`. Stage B runs
with fresh Codex and Claude clients only when Stage A passes the pre-registered
rules. A tie or loss keeps Phase 6 blocked and triggers simplification or
repositioning; it is not treated as a Continuity win.

The completed Stage A result was a tie: no context `0/10`, strong
`HANDOFF.md` `10/10`, and Continuity `10/10`. Stage B was therefore skipped and
Phase 6 runner work remains blocked. See the
[dated benchmark report](docs/testing/2026-06-24-product-falsification-results.md)
and [sanitized result rows](continuity-core/examples/product_falsification_results.jsonl).

The completed IDEA-002 trust-layer handoff benchmark validated the verified
handoff wedge, not the durable runner: continuation quality was comparable to a
maintained manual handoff, and only Continuity satisfied ledger-backed source
verification (see the scoring revision note in the dated results doc).

The first productized verified handoff surface is now available:

- Python: `continuity.handoff.build_verified_handoff(store, project_id=..., task_id=...)`
- FastAPI: `GET /projects/{project_id}/tasks/{task_id}/verified-handoff`
- MCP: `export_verified_handoff(project_id, task_id)`
- Installed CLI:

```bash
continuity-handoff --db continuity-core/continuity.db --list-tasks

continuity-handoff \
  --db continuity-core/continuity.db \
  --project-id your-project \
  --task-id your-task \
  --out HANDOFF.md
```

The exporter renders human-readable Markdown with current task state, decisions,
rejected/superseded signals, unresolved conflicts, and traceable `ledger_seq` /
`event_hash` source rows. `provenance_status` is computed from a full ledger
integrity check at export time: a valid chain renders `verified` with the tip
hash and Merkle root; a tampered ledger renders `integrity_failed` with the
broken sequence. It preserves the source of truth in the ledger and does not
unblock durable runner infrastructure.

## Capability Backtest

The current evidence supports this scoped claim:

> Continuity gives AI teams verified handoff, task verification, and persistent
> memory across agents, sessions, tools, and models.

The capability backtest in
`docs/testing/2026-07-01-continuity-capability-backtest.md` distinguishes what
Continuity already captures from what the verified handoff markdown currently
renders:

| Capability | Captured Today | Rendered In Verified Handoff Today |
|---|---:|---:|
| task state | yes | yes |
| agent identity | yes | partial |
| model identity | yes | no |
| session identity | yes | no |
| handoff source | yes | no |
| parent/consumed source chain | yes | no |
| model call/output source | yes | partial |
| validation source | yes | partial |
| memory source | yes | partial |
| tool/action source | partial | partial |

The current gap is projection/rendering, not capture. Continuity should not add
new capture logic until a test proves existing ledger data cannot answer the
product question. The system does not claim endpoint observability, shadow-agent
detection, or automatic capture of tools/actions outside Continuity.

Additional explicit non-claims:

- The ledger proves event order and immutability, not authorship. Actor and
  agent identity are process/MCP/repo-local attribution, not cryptographic or
  authenticated identity.
- `consumed_seqs` currently records the prior context available to an event,
  not a selective causal proof of what shaped it. Selective grounding exists
  only for memory (`grounded_in_seqs`) and linked validations.
- Host support means MCP-compatible hosts. Claude Code and Codex paths are
  proven by isolated tests; other hosts are untested until a dated proof says
  otherwise.

## Model Adapters

The core test suite does not call external model providers. Provider SDKs are
optional and imported lazily by their adapters.

- OpenAI/OpenAI-compatible endpoints use `OPENAI_API_KEY` and optional
  `OPENAI_BASE_URL`.
- Anthropic uses `ANTHROPIC_API_KEY`.
- Ollama uses local HTTP by default at `http://localhost:11434`, includes a
  request timeout, and raises `AdapterError` with contextual failures instead
  of hanging or leaking low-level urllib errors.

## Repository Layout

```text
continuity-core/
  continuity/       # ledger, projections, API, MCP, Console, adapters
  scripts/          # demos and proof scripts
  tests/            # pytest suite
  README.md         # core package details

docs/
  README.md         # documentation authority map
  strategy/         # current strategic background; roadmap remains authoritative
  strategy/archive/ # older strategy/research source material
  superpowers/      # historical design specs and implementation plans
  testing/           # evidence notes from local and integration tests

examples/           # shareable proof artifacts generated from real ledger data

AGENTS.md
architecture_decisions.md
ROADMAP_AND_HANDOFF.md
```

## Plans

- [Roadmap & Handoff](ROADMAP_AND_HANDOFF.md) is the canonical execution
  tracker and current Phase 0-7 sequence. Its Execution Checkpoint controls
  current implementation work.
- [Documentation Map](docs/README.md) explains which docs are authoritative,
  historical, or evidence-only.
- [Master Adversarial Review Prompt](docs/MASTER_ADVERSARIAL_REVIEW_PROMPT.md)
  can be given to Claude/Kael, the GitHub agent, Codex, or another reviewer.
- [Idea Backlog](docs/IDEA_BACKLOG.md) captures ideas without promoting them
  into active execution.
- [Continuity Implementation Plan v3.4](docs/strategy/Continuity_Implementation_Plan_v3_4.md)
  preserves strategic background and the historical phase labels used during
  its adversarial review.

## CI

GitHub Actions runs the Python test suite on pushes and pull requests to
`main`. If Actions are disabled in repository settings, enable them once; no
manual test initiation is otherwise required.

## License

Apache-2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).

## Notes

- Runtime artifacts such as SQLite databases and exported ledgers are ignored
  under `continuity-core/`.
- The known Starlette/FastAPI `TestClient` deprecation warning is third-party
  dependency churn and does not indicate a Continuity test failure.
