Event-sourced memory for agent work

Zaxy

Zaxy keeps multi-agent projects replayable: parent missions, isolated worker sessions, cited findings, purpose-conditioned checkout, conflict review, and accepted merge-back into one durable project history.

runtimeembedded Kuzu default
sourceEventloom append-only JSONL
checkoutcited, purpose-conditioned context
PyPI 2.0.0 embedded local runtime Harvey LAB 10/10 tasks Harvey LAB mean 0.788 Headline 500 R@5 1.000 Harvey LAB external signal headline 500 checkout evidence external verification requested

Coordinate

Worker-local claims are not project truth.

Spawning agents is easy. The hard part is turning isolated investigations into one trustworthy state of work. Zaxy records each worker in its own Eventloom session, reviews findings with evidence, marks stale and conflicting claims, and promotes only accepted facts into the parent mission.

Parent mission

The coordinator owns accepted project history, decisions, handoff, and Memory Checkout state.

Worker sessions

Agents investigate in isolated logs, so exploration does not contaminate authoritative memory.

Approval packets

Human or coordinator-agent review accepts, rejects, defers, or promotes findings with cited provenance.

Architecture

Eventloom is truth. The graph is a rebuildable projection.

Missionobjective and parent state
Workersisolated Eventloom sessions
Findingsevidence, confidence, citations
Reviewconflicts, stale claims, approvals
Checkoutaccepted cited prompt state

Eventloom source of truth remains the append-only project record. The default local runtime is embedded Kuzu, launched and cleaned by zaxy init and zaxy doctor. Neo4j remains the sidecar control backend; pgGraph, LatticeDB, and Pathlight are advanced integration tracks for teams that need alternate deployment or observability posture.

Purpose control plane

The same evidence can mean different memory for different work.

Zaxy now carries purpose through retrieval, checkout diagnostics, feedback, compaction, and Coordinate accepted state. This is still framed as project-local agent work memory, not a broad Company Brain claim.

memory_checkout(..., purpose="coding")

Applies deterministic purpose emphasis, recall floors, scoring profile selection, and checkout guidance.

zaxy memory purpose status

Replays active profile, checkout quality, accepted Coordinate state, and feedback posture without graph mutation.

zaxy memory purpose lanes

Shows purpose-specific checkout lanes, cited source groups, and suppression candidates.

zaxy memory purpose feedback

Surfaces positive and negative outcome history so future retrieval can prioritize useful purpose-specific memory.

Interfaces

CLI, MCP, dashboard, and adapters share the same contracts.

coordination_checkout accepted parent state plus diagnostic worker-local findings coordination_approval_packet reviewable accept/reject/defer/promote payloads memory_checkout answerability, current_citation_count, required action, and memory_feedback guidance CoordinationAdapter dependency-light Python wrapper with LangGraph and CrewAI helper paths dashboard --enable-coordinate-review opt-in human review controls over replay-backed state; read-only remains the default

Benchmark evidence

Public claims stay inside the evidence boundary.

Current public benchmark evidence is intentionally narrow: the headline 500-question LongMemEval-compatible checkout diagnostic and the Harvey LAB external legal-agent memory-ablation report. Older backend shootouts, partial slices, suite gates, and debug reports are archived as development history rather than current claims.

Harvey LAB

0.788 mean criterion pass rate

Full ten-task external legal-agent memory-ablation run, +0.184 versus regular/no-memory and 9/10 task wins versus article-best rows.

Headline 500

R@5 1.000 with citations 1.000

Full 500-question LongMemEval-compatible checkout diagnostic: mean 0.956, Answer@5 0.910, Recall@5 1.000.

Claim boundary

Checkout diagnostic, not official LME

The headline 500 is a Zaxy same-harness checkout run, not an official LongMemEval end-to-end assistant score.

Comparison posture

Evidence first, claims second

Archived reports remain useful for engineering history, but public benchmark claims now route through the benchmark hub.

Current Evidence Boundary

These rows are release evidence and disclosure status, not a universal memory leaderboard.

Artifact Status What it supports What it does not support
Harvey LAB external memory-ablation complete Full 10-task legal-agent benchmark evidence: 0.788 mean criterion pass rate, +0.184 vs regular/no-memory, +0.081 vs article-best rows, 9/10 task wins. Same-harness full-suite scores for non-Zaxy systems beyond the article-published matrix.
LongMemEval-compatible checkout 500 current headline Same-harness checkout diagnostic: mean 0.956, Answer@5 0.910, Recall@5 1.000, citation coverage 1.000. Official LongMemEval end-to-end assistant accuracy or external memory-system leaderboard ranking.

Install

Initialize a local embedded runtime, then expose memory through MCP.

pipx install zaxy-memory
zaxy init
zaxy memory log --eventloom-path .eventloom --limit 5
zaxy memory bootstrap --eventloom-path .eventloom
zaxy doctor --eventloom-path .eventloom
zaxy coordinate start "ship auth refactor" --mission auth-main
zaxy coordinate worker create --mission auth-main --worker auth-api
zaxy coordinate assign --mission auth-main --worker auth-api "trace failures"
zaxy coordinate brief --mission auth-main
zaxy coordinate checkout --mission auth-main

What happens when you run init

Zaxy writes `.env.local`, records session genesis and heartbeat, checks graph posture, and prints the MCP command or config path.

What stays local

Session history lives in .eventloom/ as append-only JSONL. The graph is a rebuildable projection.

How to prove capture

memory log, memory bootstrap, doctor, and hook-status expose Last checkout, capture, and stale-memory posture.

Documentation

Start with Coordinate and purpose. Keep the rest as operator reference.

Operator and internals reference