Cross-Domain Parameter Sweep
5 corpora × 3 parameters × 108 queries each — validating parameter robustness across domains
Chunk Size Sweep (256, 512, 768, 1024)
Finding: MRR varies by <2.5% across all chunk sizes for every corpus.
The original software engineering corpus shows the most sensitivity (0.54–0.56 MRR range);
all cross-domain corpora are flatter (<1% spread for legal, medical).
Hit Rate@5 is nearly identical across all values.
Chunk size 512 is confirmed as a safe default across domains.
Chunk Overlap Sweep (0, 25, 50, 100)
Finding: MRR spread is <1% for all corpora across all overlap values.
Hit Rate@5 is virtually identical. Overlap has no measurable impact on retrieval quality
when Arcane Recall expansion is active.
Overlap 50 retained as standard — no corpus benefits from changing it.
Expansion Similarity Threshold Sweep (0.85–0.95)
Finding: Ranking metrics are identical across all thresholds for all corpora.
The threshold controls only token volume: 0.85 produces ~40% more tokens than 0.95.
The current default of 0.92 sits in the sweet spot — moderate token savings with zero quality cost.
Threshold 0.92 confirmed cross-domain.
Summary: Parameter Sensitivity by Corpus
| Corpus | Domain | Docs | Chunk Size MRR Range | Overlap MRR Range | Threshold MRR Range |
| Original | Software Engineering | 89 | 0.025 | 0.008 | 0.001 |
| Legal | Contract Law, Regulation | 90 | 0.005 | 0.009 | 0.000 |
| Medical | Clinical Medicine | 91 | 0.005 | 0.000 | 0.000 |
| API Reference | REST API Docs | 95 | 0.008 | 0.009 | 0.005 |
| Narrative | History, Nature, Food | 89 | 0.022 | 0.009 | 0.000 |
Verdict: All three parameters show <2.5% MRR variation across all tested values
and all five corpora. The current defaults (chunk_size=512, overlap=50, threshold=0.92) are
cross-domain validated. No parameter change is needed for deploying against
legal, medical, API reference, or narrative corpora.
Caveat: These corpora are LLM-generated (Claude 3 Haiku). Real-world documents
may have more structural variation. The eval queries (108 per corpus) target 55–57 of ~90 docs.
Relevance Ward thresholds remain corpus-dependent and should be recalibrated per
Threshold Calibration.