LLM Hallucination Guard — $0, Local, 3ms
Last updated: June 2026
LLM hallucinations — where models confidently state things not supported by the evidence they were given — are a critical problem in production AI systems. Entroly's WITNESS is a local, deterministic hallucination guard that checks every LLM response against the evidence it was given at $0 cost and ~3ms latency.
WITNESS + STAVE: 85.8% accuracy, 0.844 AUROC, $0, ~3ms
GPT-4o-mini judge: 86.3% accuracy, — AUROC, LLM API call cost + latency
GPT-3.5-turbo (HaluEval paper): 62.6% accuracy
WITNESS statistically ties a strong LLM judge while running entirely locally.
The LLM Hallucination Problem
When an LLM generates a response, it can produce claims that are not supported by the context it was given. In coding agents, this manifests as:
- Referencing functions that don't exist in the codebase
- Stating that a configuration option works a certain way when it doesn't
- Citing file paths or imports that are wrong
- Making architectural claims not supported by the code it saw
Standard approaches use an LLM as a judge — sending both the context and the response to a second LLM call for verification. This adds API cost and latency on every response. WITNESS eliminates both.
How WITNESS Works
Evidence Grounding
WITNESS checks each claim in the model's response against the specific evidence chunks that were included in the context. A claim is "grounded" if it can be traced back to at least one evidence chunk. A claim is "unsupported" if no evidence chunk entails it.
STAVE — Statistical Token Alignment Verification Engine
STAVE is the core statistical verifier. It uses token-level alignment between response claims and evidence chunks, combined with NLI (Natural Language Inference) signals. The combined signal achieves 0.844 AUROC on HaluEval-QA without any LLM calls.
Local DeBERTa NLI (Optional)
For higher accuracy, enable local DeBERTa NLI inference:
ENTROLY_LOCAL_NLI=1 entroly proxy
This runs a small NLI model locally for per-claim entailment checking. Adds ~20ms latency but improves accuracy further at $0 marginal cost.
Verification Profiles
WITNESS ships with workload-specific profiles:
| Profile | Behavior | Use For |
|---|---|---|
rag | Fail closed — suppress unsupported claims | RAG applications, document Q&A |
qa | Fail closed | Question answering systems |
code | Fail closed — flag hallucinated APIs | Code generation agents |
chat | Warn — flag but don't suppress | Conversational AI |
summary | Warn | Summarization tasks |
Using WITNESS
In the Proxy (Automatic, Inline)
# Enable WITNESS on all proxy responses entroly proxy --witness strict --witness-profile code # Every response gets a proof certificate automatically
Via CLI (Manual Verification)
entroly witness \ --context-file evidence.txt \ --output-file answer.txt \ --mode strict
Via Python SDK
from entroly.witness import WitnessVerifier
verifier = WitnessVerifier(profile="code")
result = verifier.verify(
context="def authenticate(user, password): ...",
response="The authenticate function takes three arguments"
)
print(result.verdict) # UNSUPPORTED
print(result.flagged_claims) # ["takes three arguments"]
Via MCP Tool
The entroly_witness MCP tool is available to any MCP-connected client (Cursor, Claude Code, VS Code). The model can call it directly to verify its own output against supplied evidence.
Context Receipts — Know What Evidence Was Used
For WITNESS to work, it needs to know what evidence was included in the context. Entroly's Context Receipts track exactly what was selected, what was omitted, and why — giving WITNESS a precise evidence set to verify against.
entroly select --query "fix the auth bug" --budget 8000 entroly receipt .entroly/receipts/latest.json # shows: selected chunks, omitted chunks, ranking reasons, fingerprints
Hallucination Guard in Agent Pipelines
In multi-agent or agentic systems, WITNESS can run at every handoff point. Entroly's Witness-Verified Handoff feature checks the output of one agent before it's passed as context to the next — preventing hallucination propagation across agent boundaries.
from entroly.verified_handoff import VerifiedHandoff
handoff = VerifiedHandoff(profile="code")
verified_context = handoff.check_and_filter(
agent_output=agent_1_result,
evidence=context_used_by_agent_1
)
# pass verified_context to agent_2
Install and Enable WITNESS
pip install entroly # Enable in proxy mode entroly proxy --witness strict # Test locally with no API key entroly witness --context-file mycode.py --output-file ai_response.txt
Stop LLM Hallucinations in Your Pipeline
$0. 3ms. Local. Open-source Apache-2.0.
pip install entroly && entroly proxy --witness strict