Cold-Checkout Performance on Large Logs — Design

Status: proposed (design-first; agree before code) Date: 2026-06-16 Owner: Zaxy core

1. Problem

A cold checkout — a fresh process with empty in-process caches — is slow on a large session log. Measured on a real ~115 MB zaxy-default.jsonl (~118k events): ~18–22 s. This is paid by every CLI invocation (zaxy memory checkout, hook-driven calls), on every server restart, and after a cache invalidation. The warm path (a long-lived server with populated in-process caches) is already ~1.6 s; the gap is entirely cold-start rebuild cost.

The in-process incremental caches (SessionRetrievalCache, shipped 2.4.2–2.4.4) extend the verbatim index and verified replay with only the appended tail — but they live in memory and die with the process, so a cold process rebuilds everything from the whole log.

2. Evidence (cProfile, real 115 MB log)

Cost Time Symbol
Verbatim BM25 tokenization ~12 s verbatim.py:_tokens (5.6 s self / 79k events) + postings (_term_document_ids) inside VerbatimIndex.from_events
Verified hash-chain replay ~6–9 s event.py:verify / verify_event_chain over every event, via _cached_full_replay
Salience ledger + other full-log replays ~4–9 s eventlog.read_all() consumers in core/fabric.py (e.g. 970, 1337, 1517, 1546, 2090, 2106, 2330)
(Redundant parse) ~4–5 s investigated and rejected as a fix — marginal and superseded (see §7)

Key correction from an earlier assumption: the cost is tokenization- and verify-dominated, not read/parse-dominated. The fix must avoid re-tokenizing and re-verifying the whole log on cold start, not just avoid re-reading it.

3. Goals / Non-goals

Goals

Non-goals

4. Invariants (hard constraints)

  1. Log is authority. The checkpoint is a pure cache. Any mismatch, corruption, version skew, or shrink/rewrite → discard and full rebuild. The system must be correct with the checkpoint deleted.
  2. Integrity preserved. The hash chain is verified up to the live tip on every cold load. We never trust cached derived state without anchoring it to a verified event hash from the live log (see trust model §5.2).
  3. Byte-identical results. A checkpoint-loaded-and-tail-extended index/replay must produce results identical to a from-scratch rebuild over the same log. This is the core correctness test.

5. Design — persistent verified derived-state checkpoint

5.1 Where it lives

Per session, beside the projections it accompanies: .eventloom/projections/<session>.retrieval-cache/ containing a small header plus the serialized derived structures. It is a derived artifact (cache), git-ignored, and safe to delete at any time.

5.2 Trust model (the integrity anchor)

The user-selected approach: a dedicated header recording the covered tip.

The header records: format_version, covered_seq, covered_hash (the hash of the event at covered_seq), and the source log path. On cold load:

  1. Read the live event at covered_seq; if its hash ≠ covered_hash, the log was rewritten/compacted/shrunk → discard, full rebuild.
  2. Otherwise the prefix 1..covered_seq is anchored to a verified hash. Verify

only the tail (covered_seq+1 .. live tip) with verify_event_chain, anchored at covered_hash — the exact mechanism _extend_replay already uses.

  1. Extend the loaded derived structures with the tail (the existing append_chunks / _extend_replay paths). Persist the new tip.

The prefix is trustworthy because the checkpoint is only ever written after a full verify, so covered_hash is a verified anchor; re-matching it against the live log plus a verified tail re-establishes whole-chain integrity without re-hashing the prefix.

5.3 What to persist

5.4 Serialization format (open — decided by prototype)

Constraints: load must be materially faster than rebuild; format must be safe to load from a file on disk; reasonable size. Options to measure: a compact binary (e.g. msgpack), a hand-rolled binary, or JSON. Pickle is discouraged (code -execution risk on a tampered file); if chosen it must be strictly gated by the header check and treated as same-trust-domain as the log — prefer a non-executable format. Decided in §8.

5.5 Integration

All changes localized to SessionRetrievalCache (src/zaxy/retrieval_cache.py), which already owns the cold/tail/invalidate logic and is the single shared implementation behind both the fabric and the MCP front door:

5.6 Concurrency

Multiple local processes may read/write. Reads are always safe (load → verify → rebuild-on-mismatch). Writes use atomic rename; last-writer-wins. A stale checkpoint only means a longer tail to verify — still correct, never wrong. No lock required; document this reasoning. (A lockfile is a possible optimization, not a correctness requirement.)

6. Phasing (separate PRs; measurable targets)

Each phase: byte-identical results test, integrity preserved, full-rebuild fallback exercised, ruff + mypy + full suite green, and a before/after cold measurement on a large log recorded in the PR.

7. Rejected / superseded alternatives

8. Open questions (resolve during agreement / Phase-1 prototype)

  1. Serialization format + what-to-persist — prototype load-vs-rebuild on the real 115 MB log for: (tokens only + recompute postings) vs (full derived state), in a compact binary vs JSON. Pick the fastest safe load. This is the make-or-break measurement for Phase 1.
  2. Salience ledger state — persist it too (Phase 3), or just route consumers through the shared verified replay? (Leaning: share the replay first; persist only if still hot.)
  3. Write cadence — persist on every cold build only, or also after the tail grows past a threshold? (Leaning: cold build + threshold, to keep CLI writes cheap.)
  4. Checkpoint location & gitignore — confirm .eventloom/projections/ and add the cache dir to ignore rules.

9. Done-when

Cold checkout on a large log is materially and measurably faster (trajectory in §6), integrity and byte-identical-results invariants hold with tests proving both the checkpoint path and the full-rebuild fallback, the checkpoint is safe to delete/corrupt (always falls back), ruff + mypy + full suite green, and each phase records a before/after cold measurement.

References