Competitive Positioning

Zaxy's product thesis is "git for agent memory": an event-sourced, replayable, auditable memory fabric that projects durable context into graph, lexical, verbatim, and active working-set views.

What Stands Out

MemPalace Target

MemPalace is the current public target for LLM memory product comparison. Zaxy should compete on trust and provenance rather than trying to match every UX surface first. The benchmark lane should stay architecture-driven:

  1. Temporal correctness: recover old and current facts without overwriting history.
  2. Source recall: answer with verbatim Eventloom citations and transcript source anchors.
  3. Relational recall: follow graph relationships across goals, tasks, decisions, files, symbols, and test coverage.
  4. Context collapse resistance: preserve identity through compaction and active working-set projection.
  5. Auditability: replay how a memory was written, projected, retrieved, and reinforced.

Public Benchmark Posture

The current Zaxy public benchmark hub is benchmarks.md. Zaxy's same-harness evidence should distinguish the archived LongMemEval-compatible 100-question headline from the full 500-question archive. The 100-question run remains the strongest headline: Zaxy mean score 0.970, Answer@5 0.950, citation coverage 1.000, and R@1/R@5/R@10 1.000, with BM25 in the same report at mean score 0.540, Answer@5 0.500, and R@5 0.840. The legacy limit=10 full 500-question hash run is a separate no-regression floor: Zaxy checkout mean score 0.626, Answer@5 0.608, citation coverage 1.000, and R@1/R@5/R@10 of 0.944/0.956/0.956 versus BM25 mean score 0.560, Answer@5 0.516, and R@5 0.770. The current same-harness limit=5 backend-evaluation control uses workload SHA-256 0dc36a139bb9a4fdc7c6cd34400737a58a1eb7410517341f015e9fbfc76ed854 and sets the projection-backend floor at Zaxy checkout mean score 0.714, Answer@5 0.626, citation coverage 1.000, and R@5 0.958.

Competitor numbers belong in an external-disclosure table, not a universal leaderboard. MemPalace publicly reports 96.6% raw LongMemEval R@5 and 98.4% held-out hybrid R@5. Agent Memory publicly reports 95.2% R@5 on LongMemEval-S. Mem0 publicly reports 94.4% LongMemEval accuracy and lower-token memory retrieval, plus LoCoMo accuracy gains; those are different metric families from Zaxy's local retrieval reports unless run through the same harness. These claims are important market context, but they are not same-harness Zaxy results.

Same-Harness Adapter Feasibility

MemPalace is the strongest adapter candidate because its public repo documents a local benchmarks/longmemeval_bench.py path and committed per-question result files. A Zaxy adapter should wrap that command, pin the mode and top-k settings, and import per-question retrieval hits into Zaxy's report schema.

Mem0 is a benchmark harness candidate rather than a drop-in retrieval adapter: the public mem0ai/memory-benchmarks project can run LongMemEval, but the OSS path requires Docker, Qdrant, model configuration, and LLM answer/judge choices. The first Zaxy integration should document those inputs and separate retrieval-only comparisons from judge-scored answer accuracy.

Agent Memory remains external disclosure only for now. Its product page reports LongMemEval-S R@5 and a BM25/vector/graph retrieval stack, but the public page does not provide a stable same-harness CLI/API contract for Zaxy to call. Keep the number in the disclosure table until a reproducible command and result export are available.

Near-Term Roadmap

Related pages: benchmark-review.md, benchmarks.md, integrations.md, and architecture.md.