KB Arena
Which retrieval architecture works best for your documentation? 7 strategies, tiered difficulty questions — empirical evidence so you don't have to guess.
How it works
Same question
Each question — from simple lookups to multi-topic dependency chains — is sent to all 7 strategies simultaneously.
4-pass evaluation
Structural checks, entity coverage, source attribution against your docs, then LLM-as-judge scoring.
Ranked report
Accuracy by tier, latency percentiles, reliability rates, and cross-strategy composite ranking across your documentation.
The 7 strategies
Naive Vector
Embed doc pages as chunks, retrieve by cosine similarity. Baseline approach — fast, simple, no cross-topic awareness.
Contextual Vector
Embed chunks with parent topic context prepended. Better at disambiguating domain-specific terms across large documentation sets.
QnA Pairs
Pre-generate Q&A pairs from docs using an LLM, then embed and retrieve the pairs. High precision on common domain questions.
Knowledge Graph
Extract entities, components, and dependencies into Neo4j. Query with Cypher templates matched to question intent. Best on multi-topic architectures.
Hybrid
Route by intent: factoid → vector, cross-topic → graph, complex → both with Reciprocal Rank Fusion. Adapts per question.
RAPTOR
Build a recursive tree of LLM cluster summaries over the corpus. Query all levels simultaneously — leaf chunks + broad topic synthesis for Tier 4/5 questions.
PageIndex
Vectorless, reasoning-based retrieval. Builds a hierarchical tree index from document structure, then uses LLM reasoning to traverse the tree — no embeddings, no chunking.
BM25
Classic keyword matching with BM25Okapi scoring. The lexical baseline — no embeddings, no LLM retrieval. Shows whether neural retrieval adds value for your docs.
5 difficulty tiers, auto-generated or hand-crafted
AWS Compute
—