KB Arena

Which retrieval architecture works best for your documentation? 7 strategies, tiered difficulty questions — empirical evidence so you don't have to guess.

How it works

1

Same question

Each question — from simple lookups to multi-topic dependency chains — is sent to all 7 strategies simultaneously.

2

4-pass evaluation

Structural checks, entity coverage, source attribution against your docs, then LLM-as-judge scoring.

3

Ranked report

Accuracy by tier, latency percentiles, reliability rates, and cross-strategy composite ranking across your documentation.

The 7 strategies

Naive Vector

Embed doc pages as chunks, retrieve by cosine similarity. Baseline approach — fast, simple, no cross-topic awareness.

Contextual Vector

Embed chunks with parent topic context prepended. Better at disambiguating domain-specific terms across large documentation sets.

QnA Pairs

Pre-generate Q&A pairs from docs using an LLM, then embed and retrieve the pairs. High precision on common domain questions.

Knowledge Graph

Extract entities, components, and dependencies into Neo4j. Query with Cypher templates matched to question intent. Best on multi-topic architectures.

Hybrid

Route by intent: factoid → vector, cross-topic → graph, complex → both with Reciprocal Rank Fusion. Adapts per question.

RAPTOR

Build a recursive tree of LLM cluster summaries over the corpus. Query all levels simultaneously — leaf chunks + broad topic synthesis for Tier 4/5 questions.

PageIndex

Vectorless, reasoning-based retrieval. Builds a hierarchical tree index from document structure, then uses LLM reasoning to traverse the tree — no embeddings, no chunking.

BM25

Classic keyword matching with BM25Okapi scoring. The lexical baseline — no embeddings, no LLM retrieval. Shows whether neural retrieval adds value for your docs.

5 difficulty tiers, auto-generated or hand-crafted

AWS Compute

Tier 1 — FactoidTier 2 — ProceduralTier 3 — ComparativeTier 4 — RelationalTier 5 — Multi-hop

Built with

Python 3.11+Pydantic v2FastAPINeo4j 5ChromaDBAnthropic ClaudeOpenAI EmbeddingsNext.js 14Tailwind CSSRecharts