harness · advisor
A local memory layer that gates its own recall
A raw pgvector table drifts silently and carries no trust card on its chunks — you cannot tell a Spark-measured fact from an external claim, and a re-index can quietly drop recall with no alarm. Orionfold Cortex wraps the index with stamped provenance, a coverage report, and a recall@k promotion gate, all dispatched and watched through the Arena control plane. The machine manages its own memory.
- Re-index a multi-source corpus (article · lineage · eval · scout · deep_research) with provenance stamped per chunk, dispatched from the cockpit
- Score chunk-recall@k + slug-recall@k against an in-repo gold set, gated like-for-like against the prior index so a rebuild can't silently regress recall
- Query the Second Brain with a provenance/trust-tier filter — cited hits a hosted RAG can't honestly attribute
Audience — DGX Spark operators running a private, local-first RAG recall layer they drive, not a SaaS.
| Variant | qa-eval.jsonl · 44 held-out Q · chunk-recall@5 / slug-recall@5 (cosine-only, GB10) |
|---|---|
| cosine-only · top_k=5 · GB10 measured baseline sweet spot | — |
| chunk-recall@5 | 0.41 |
| slug-recall@5 | 0.73 |
- Reranker absent on GB10 the cosine-only score over top-5 retrieval is the floor, not the reranked ceiling; 1 reranker lane is unsupported on GB10 (NGC 410-gone, no -dgx-spark profile), so rerank=True hard-raises rather than mislabel a score (R22).
- Generator-side metrics not in this lane 3 of 3 generator-side scores (faithfulness / correctness / refusal-rate) are left null — they need the generator NIM; this is the retrieval-only recall measurement.
- Source-class population the multi-source provenance schema is live across 5 classes (article · lineage · eval · scout · deep_research) but only the article class is populated today — 313/313 chunks across 49 published articles; the other 4 ingest paths are wired but unpopulated.
- Gold-set size recall is measured over 44 qa-eval rows, not a large-N guarantee.