Output equivalence is not process equivalence
Review Punchline
Final-output equivalencepreserved
Process regressioncaught
CI gate resultblocked
Candidate is missing 1 baseline evidence link: claim-duration.
Claim Boundary
This report is not a compliance attestation.
This artifact does not certify safety.
Display note: external URLs and sensitive-looking values may be redacted in this HTML; source artifacts remain unchanged.
Final-Output Comparison
- Visible output equivalence
- preserved
- Case coverage
- same case IDs
- Baseline evaluation state
- pass
- Candidate evaluation state
- fail
| Case | Baseline recommendation | Candidate recommendation | Baseline outcome | Candidate outcome | Equivalence |
|---|---|---|---|---|---|
| shared-source-multi-claim | approve | approve | approve | approve | preserved |
Baseline Process Evidence
| Case | Claims | Evidence refs | Linked claims | Policy states | Human review | Provider/model | Tools |
|---|---|---|---|---|---|---|---|
| shared-source-multi-claim | claim-duration | evidence-duration | claim-duration | none | not required | not recorded | benefit-policy-lookup |
Candidate Process Evidence
| Case | Claims | Evidence refs | Linked claims | Policy states | Human review | Provider/model | Tools |
|---|---|---|---|---|---|---|---|
| shared-source-multi-claim | claim-duration | none | none | none | not required | not recorded | benefit-policy-lookup |
Missing Evidence Link Diff
| Case | Claim | Baseline evidence refs | Candidate evidence refs |
|---|---|---|---|
| shared-source-multi-claim | claim-duration | evidence-duration | none |
Process Invariant Findings
| Case | Control | Target | Reason code | State | Message |
|---|---|---|---|---|---|
| shared-source-multi-claim | material_claims_have_evidence | claim:claim-duration | MATERIAL_CLAIM_MISSING_EVIDENCE | fail | fixture-declared material claim has no evidence link |
Comparison Classification
- Classification
- new_failure
- Baseline state
- pass
- Candidate state
- fail
- Provenance changes
- none
- Verdict findings
- 1 item: MATERIAL_CLAIM_MISSING_EVIDENCE
Fixture-Equivalence State
- State
- pass
- Review note
- Baseline and candidate fixture material remained equivalent for comparison.
CI Gate Result
- Result
- blocked
- Evidence packet
- packet-duration
- Packet limitations
- 1 item: Local deterministic fixture evidence for human review.
Artifact Paths
| Artifact | Path |
|---|---|
| baseline run set | baseline.runset.json |
| candidate run set | candidate.runset.json |
| evidence packet | evidence-packet.json |
Artifact Digests
| Role | Path | SHA-256 |
|---|---|---|
| evaluation-summary | packet artifact digest | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
| compiled-suite | compiled.json | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
| candidate-runset | candidate.json | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |