Governed Active Memory

Zaxy 3 position paper (I8p — "Proof & Category Definition"). The category claim, the three differentiating claims, the now-real demo, and the honest evidence boundary for Zaxy's event-sourced agent-memory system. Every Zaxy capability cited below maps to shipped code on master; every number comes from a committed benchmark artifact and carries its caveat inline. Competitor figures are labeled vendor- or self-reported and are never restated as Zaxy's own. The grounding plan is ZAXY-3.md; the metrics index is AGENTS.md; the published benchmark surface is benchmarks.md.

Summary

The agent-memory field has converged on one idea: memory should be active — it should reflect on experience, distill skills, prevent repeated mistakes, and get better over time. The two strongest category rivals make memory active by letting it mutate itself: caura-memclaw runs an autonomous nightly "Crystallizer" over a mutable store; Letta (formerly MemGPT) gives the model self-editing memory blocks under last-write-wins concurrency. Both ship working active-learning loops. Both, by construction, give up the ability to answer "what changed, why, on whose evidence, and can we take it back?"

Zaxy's thesis — Governed Active Memory — is the inverse:

Other systems make memory active by letting it mutate itself. Zaxy makes memory active while keeping every change a gated, replayable, cited event.

This is not a slogan retrofitted onto a vector database. Zaxy's substrate is an append-only, hash-chained Eventloom log; the entire memory state is a pure replay function of that log. Every learning step — a reinforcement, a generated rule, a consolidation, a cross-agent promotion, a forget — is a sealed event with a citation and a hash-chain position. The 2026 governance literature now argues this is mandatory rather than optional. This paper states the category, maps the three claims to shipped code and event types, walks the demo that no mutable-store or last-write-wins system can reproduce, and draws a deliberately conservative line between what is proven today and what is not yet independently validated.

The category, and where the rivals sit

The competitive teardown in ZAXY-3.md §3 is the source of truth; the factual positions are summarized here.

caura-memclaw (caura-ai) — the direct category rival. "Fleet memory for AI agents — governed, shared, self-improving." Architecture: a mutable PostgreSQL + pgvector + Redis store with an async event bus — not event-sourced. Its active-learning loop (the "Karpathy Loop") has agents report success/fail/partial outcomes; winners are reinforced and failures auto-generate preventive rule memories. A nightly Crystallizer LLM-merges near-duplicates and retires stale data autonomously, not review-gated. memclaw added a tamper-evident hash-chained audit log in v2.17.0 — but the hash-chaining is on the audit log only; the memory store itself remains mutable. Its proof points are [vendor-claimed] (eToro: 300+ agents, 26.5k memories, 1,372 shared skills, 23 ms p50) and [self-reported] (LoCoMo 77.6%, LongMemEval 72.5%, 96–98% token savings). These are memclaw's figures on memclaw's harness; they are not Zaxy results and are reproduced here only to characterize the rival.

Letta (letta-ai, formerly MemGPT) — the category's research engine. An OS metaphor (MemGPT, arXiv 2310.08560) with core in-context memory blocks, recall, and archival tiers, plus self-editing memory (insert/replace/rethink tools). Its active loop is sleep-time agents: a background agent shares the primary's memory blocks and asynchronously rewrites them into "learned context" (Sleep-time Compute, Lin et al. 2025, arXiv 2504.13171; the paper reports [vendor/ self-reported] ~5× less test-time compute). Edits are largely autonomous; concurrency is last-write-wins; there is no tamper-evident log and no first-class citations or grounded checkout.

Both rivals are ahead on the active axis (a shipped loop plus background reflection in production) and behind on provenance/governance-by-construction (mutable stores, autonomous overwrite, last-write-wins). Zaxy is the inverse, and Zaxy 3 closes the active gap without surrendering the substrate that makes the provenance claims true. The category Zaxy defines and intends to own is Governed Active Memory: active learning where every mutation is a gated, replayable, cited event.

The three claims, mapped to shipped code

Each claim below names the shipped module and event type that makes it true. The load-bearing invariant under all three (ZAXY-3 §9): the log is the source of truth, every derived artifact is authority_status=non_authoritative until it passes an explicit gate, nothing rewrites history, and everything cites an eventloom://<thread>/events/<seq>#<hash> source.

Claim 1 — Provable evolution

Every reinforcement, rule, consolidation, promotion, and forget is a cited, hash-chained event, and the whole memory state is a replay of the log.

This is precisely the property the 2026 security literature now calls mandatory: long-term-memory security "cannot be retrofitted at retrieval or execution time alone, but must be anchored in storage-time provenance, versioning, and policy-aware retention from the outset" — Verifiable Memory Governance, Lin et al. 2026 (arXiv 2604.16548). Zaxy is that from the substrate up.

Claim 2 — Drift-resistant consolidation

Compaction is additive, source-backed, and audited; the log is never rewritten by a summarize-and-overwrite step.

Competitors crystallize by LLM summarize-and-overwrite. SSGM (Lam et al. 2026, arXiv 2603.11768) names the failure mode: semantic drift — knowledge degrades through iterative summarization. Zaxy treats this warning as a hard constraint.

one-shot reflection pass — no always-on daemon and no MCP tool, so the MCP surface stays pull-only. Each pass schedules existing primitives (consolidation, procedure mining, the metacognition monitor, the compaction audit, a read-only salience replay), routes every fresh candidate through the I4 gate, and appends one crystallization.run.completed summary event citing the candidate ids it produced. "Auto-apply" means an accepted review that is still non-authoritative and reversible within the rollback window — never a promotion to authority, and never a destructive overwrite. Details: crystallization.md, consolidation.md.

Claim 3 — Reversible and verified forgetting

Forgetting is reversible attenuation by default; hard deletion is governed cryptographic erasure that leaves the hash chain verifiable.

VMG (2604.16548) requires storage-time provenance, versioning, rollbackability, and verified-forgetting. Zaxy ships all four.

Honest caveat (carried from editability.md): the erasure crypto reuses Zaxy's portable-bundle envelope, which is experimental and unaudited — do not rely on it for high-value-secret or compliance guarantees without an independent cryptographic review.

The demo (now real)

The differentiation that survives a demo (ZAXY-3 §10) is a four-step sequence that exercises one Eventloom session end to end. Every step below maps to shipped code; none of it is possible on a mutable store (memclaw) or under last-write-wins (Letta).

  1. Evolve — generate a preventive rule from a failure. An agent reports a failure outcome on a recalled memory; MemoryFabric.generate_preventive_rule routes it through the I4 gate (evolution.gate.evaluated, op rule_generate) and appends a cited memory.rule.generated event (outcome_learning.py).
  2. Replay — reconstruct exactly how the rule came to be. The rule cites its source events; memory_replay / zaxy replay replays the hash-chained log to show the precise failure observations and the gate decision that produced it, and EventLog.verify() confirms the chain is intact.
  3. Roll it back — reverse the evolution. MemoryFabric.rollback_memory appends a cited memory.rolled_back event (editable.py) that, on replay, undoes the rule's effect — additively, without mutating the original rule event.
  4. Verified-forget — crypto-erase a payload while keeping verify() green. MemoryFabric.verified_forget destroys the wrapped key for a forgettable payload and appends a cited memory.forgotten tombstone (forgetting.py); the ciphertext and its hash are untouched, so the hash chain still verifies, yet the plaintext is gone for good.

A mutable store cannot replay how a rule was formed because it overwrote the evidence; a last-write-wins store cannot guarantee a rollback or a verified erase leaves an intact, tamper-evident chain. Governed Active Memory can do all four because each step is just one more sealed, cited event.

Evidence (honest)

Two committed artifacts back the claims. Both carry their caveats inline; neither is overstated.

FleetBench scaffold (governance, token efficiency, transfer proxy)

Source: reports/experimental/fleet-benchmark-scaffold/report.md (and report.json), version fleet-v1, fingerprint d4619d57…536cfa30 (scored fields only; latency excluded). Measured over real CoordinationBench runs at three scale points.

worker_count coordination_quality governance_correctness cross_agent_transfer (proxy) token_efficiency latency_ms
3 0.907407 1.0 1.0 0.535 19.967
5 0.907407 1.0 1.0 0.586667 9.146
8 0.907407 1.0 1.0 0.645714 10.32
mean 0.907407 1.0 1.0 0.589127 13.144

What this shows, with caveats:

REAL, exact-scored, deterministic aggregates of CoordinationBench signals (accepted parent state, stale-claim rejection, duplicate-consolidation rejection, non-authoritative-leakage prevention, evidence coverage). This is direct evidence for the governance-correctness claim on the harness.

LongMemEval 500 hash report (plumbing and recall parity)

Source: AGENTS.md Metrics table. The full 500-question LongMemEval-compatible hash checkout: mean 0.724, Answer@5 0.628, Recall@5 0.972, citation coverage 1.000, p95 1472.11 ms, p99 2652.55 ms.

Caveats, stated plainly:

Current evidence boundary

This is the section the honesty discipline exists for. Drawing the line clearly is part of the claim.

Proven today (on committed artifacts):

Not yet independently validated (do not claim):

No number in this document is a substitute for any of the above. The category claim rests on the governance/provenance properties, which are proven; the performance and head-to-head claims are explicitly deferred.

Research foundation

The Zaxy 3 design borrows mechanism from neuroscience and the agent-memory literature, and the 2026 governance papers validate the substrate bet. The full mapping is in ZAXY-3.md §5; the essentials follow with real links.

Neuroscience → mechanism.

Agent-memory systems → what to match/beat.

2026 governance → why the substrate bet is right.

Reproducibility

Every number above is checkable from committed artifacts.

FleetBench scaffold. Regenerate over real CoordinationBench cases with the shipped CLI:

zaxy fleet-benchmark --output-dir reports/benchmarks/fleet-v1 \
  --worker-counts 3,5,8 --missions 1

The exact command that produced the committed scaffold artifact (reports/experimental/fleet-benchmark-scaffold/report.md) is:

env PYTHONPATH=src EMBEDDING_ENABLED=true EMBEDDING_PROVIDER=hash EMBEDDING_DIMENSION=1536 \
  python -c 'from pathlib import Path; from zaxy_benchmarks.fleet_benchmark import run_fleet_benchmark; run_fleet_benchmark(Path("reports/experimental/fleet-benchmark-scaffold"))'

The scored fields are deterministic and fingerprinted; latency_ms is excluded because it is wall-clock and environment-dependent.

LongMemEval 500. The hash-report metrics live in the AGENTS.md Metrics table; the published headline checkout report and its claim boundary live in benchmarks.md (the headline 500 artifact under reports/benchmarks/). The benchmark page is the authority on which artifact is the current public claim.