Numeric summaries pending a complete Bedrock Kimi canonical profile
Every memorybench summary on this page is in the
pending_first_run state. The benchmark plumbing is complete:
provider plugin, local and self-hosted runners, tolerance checks, and
the repro kit have all shipped. The canonical judge is
moonshotai.kimi-k2.5 through Bedrock.
Three explicit paths can produce complete summaries:
- Manual self-hosted dispatch for an artifact-only CI rehearsal.
- Local laptop run with
make memory-bench-localfor the fast canonical profile. - Reviewed promotion PR that moves a complete local or self-hosted run into canonical JSON.
Latest canonical numbers
LongMemEval memscore
-
No real run yet
LoCoMo memscore
-
No real run yet
ConvoMem memscore
-
No real run yet
Per-benchmark detail
| Benchmark | Status | memscore | accuracy | latency_ms | context_tokens | run_at (UTC) | Supermemory claim |
|---|---|---|---|---|---|---|---|
| Loading... | |||||||