LLM Hallucination Detection
Does your model's response
actually check out?
Multi-method verification — NLI entailment, self-consistency sampling, and LLM-as-judge — fused into a single risk score with per-claim evidence.
› Add context & prompt (optional)
Source context
Prompt
NLI
Judge
Logprobs
QA
Consistency
Try an example:
Extracting claims…
0
/ 100
–
Was this detection accurate?
Thanks for the feedback!
Annotated response
Claim verification
Risk distribution
Method usage
Score trend — last 60 scans
Feedback accuracy by risk level