### expert_build/exam.py:_extract_json
VERDICT: PASS
CORRECTNESS: QUESTIONABLE
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: The regex fallback `\{[^{}]+\}` cannot match JSON where the `explanation` value itself contains `{` or `}` characters (e.g. `"explanation": "use {braces} here"`). This is unlikely for the current answer/verdict schemas but worth noting. The code fence stripping is reasonable for single-block responses. Tests cover all five paths (plain, fenced, embedded, invalid, truncated).

---

### expert_build/exam.py:extract_answer
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Signature change is backward-compatible (new params default to `None`). Retry guard `if model and prompt` correctly skips when called without context. The fallback `response.strip()[:100]` is unchanged from before. One minor robustness note: `data["answer"].strip()` would throw `AttributeError` if the LLM returns a non-string value for "answer" (e.g. `{"answer": 42}`), but this is an extreme edge case given the prompt. Caller at line 232 correctly passes model and prompt. Tests cover: happy path, whitespace, retry success, retry failure, no-retry, and code fence.

---

### expert_build/exam.py:judge_answer
VERDICT: CONCERN
CORRECTNESS: QUESTIONABLE
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: PARTIAL
INTEGRATION: WIRED
REASONING: Unlike `extract_answer`, `judge_answer` unconditionally retries on non-JSON — there's no guard, because `model` is required. This means every non-JSON initial response costs a second LLM call. That's fine as intentional behavior, but the `except Exception: pass` on the retry path silently swallows all errors (network timeout, rate limit, etc.) with no logging. The WARN print fires *before* the retry attempt but there's no WARN if the retry itself errors — only the final "could not parse verdict JSON" fires. **Missing test**: the case where the retry `invoke_sync` itself raises (e.g. timeout) is untested — the `except Exception: pass` catches it but no test verifies the fallback `(False, "no verdict")` in that specific path. `test_judge_handles_llm_error` only tests the *initial* call raising.

---

### expert_build/prompts.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Clean prompt migration from `ANSWER:/EXPLANATION:` and `VERDICT:/EXPLANATION:` regex format to JSON. The double-brace escaping (`{{`, `}}`) is correct for Python `.format()` strings — the LLM sees literal `{` and `}`. The "CORRECT or WRONG" phrasing in the verdict prompt is clear.

---

### tests/test_exam.py
VERDICT: CONCERN
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: PARTIAL
INTEGRATION: WIRED
REASONING: Good coverage of the main paths. `pytest` is imported but unused (no `pytest.raises`, no parametrize, no fixtures). Missing tests: (1) `judge_answer` retry path where the retry *itself* raises an exception, (2) `_extract_json` with nested braces in values, (3) `extract_answer` retry path where `invoke_sync` raises. The `test_judge_fallback_after_failed_retry` test uses a single `return_value` for both the initial and retry calls — correct but could be clearer with a `side_effect` showing two distinct non-JSON responses to make intent obvious.

---

### SELF_REVIEW
LIMITATIONS: Could not see the full `invoke_sync` implementation to verify its exception behavior and timeout semantics. Could not verify whether `EXAM_JUDGE` and `EXAM_ANSWER` prompt templates are the complete prompts or composed with other context. The observation for `invoke_sync` returned "not found" — it's imported from `.llm` but the body wasn't available, so I can't confirm what exceptions it raises or whether appending the retry message to the prompt could exceed any token limits.

---

### FEATURE_REQUESTS
- Include the implementation of imported functions when they're called in modified code (e.g. `invoke_sync` from `.llm`)
- Flag untested exception paths automatically (e.g. `except Exception: pass` blocks without corresponding test coverage)

---
