## Code Review: Regex-to-JSON Exam Parsing with Retry

### expert_build/exam.py:_extract_json
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: First/last brace extraction is a sound heuristic for single-object LLM responses. Code-fence stripping is correct. Return type hint says `dict | None` but `json.loads` can return lists/scalars — callers guard with `"key" in data` which handles non-dict types gracefully (no crash, just falls through to retry/fallback), so this is cosmetic, not a bug.

---

### expert_build/exam.py:extract_answer
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: PARTIAL
INTEGRATION: WIRED
REASONING: Logic is sound. Defensive `str()` coercion on `data["answer"]` is good. Retry gating on `model and prompt` is correct — the sole production caller passes both. Fallback to `response.strip()[:100]` matches prior behavior. Missing one test case: `invoke_sync` raising during retry (analogous to `test_judge_retry_itself_raises`), though the bare `except Exception: pass` means it's functionally covered by the fallback test.

---

### expert_build/exam.py:judge_answer
VERDICT: CONCERN
CORRECTNESS: QUESTIONABLE
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: `data["verdict"].strip().upper()` at lines 163 and 174 will crash with `AttributeError` if `verdict` is not a string (e.g., boolean `true`). `extract_answer` correctly uses `str(data["answer"])` for the same pattern — `judge_answer` should do the same with `str(data["verdict"]).strip().upper()` for consistency and robustness. In practice LLMs return strings here, so this is unlikely to trigger, but it's an inconsistency within the same PR.

---

### expert_build/prompts.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Prompt format changes are consistent with the code-side JSON parsing. Double-brace escaping (`{{`, `}}`) is correct for Python `.format()` templates. The "Respond with ONLY this JSON" instruction is a standard pattern for structured LLM output.

---

### tests/test_exam.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Good coverage across happy path, retry success, retry failure, fallback, exception handling, and case insensitivity. Tests correctly mock `invoke_sync` at the module level. One minor gap: no test for `extract_answer` when the retry itself raises (the `judge_answer` side has `test_judge_retry_itself_raises` but `extract_answer` doesn't have the equivalent). Another gap: no test for `_extract_json` returning a non-dict (e.g., JSON array `[1,2,3]`) to confirm callers handle it.

---

### SELF_REVIEW
LIMITATIONS: Could not see `invoke_sync` implementation (observation errored) — cannot verify its signature matches the `timeout` kwarg usage. Could not see the full `cmd_exam` function to verify no other call sites were missed. The `re` import is retained but I couldn't verify it's still used elsewhere in the file without reading the full file.

---

### FEATURE_REQUESTS
- Show the full source file (not just diff hunks) when functions are added/modified, so reviewers can verify import usage and catch dead code
- Include the implementation of called functions (like `invoke_sync`) when they're invoked with new arguments in the diff
