Review of the transition to JSON-based communication for the exam runner. The changes include adding retry logic for LLM parsing failures and a new test suite.

### expert_build/exam.py:_extract_json
VERDICT: CONCERN
CORRECTNESS: QUESTIONABLE
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: ADDRESSES
BELIEF_COMPLIANCE: CONSISTENT
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: The regex fallback `r"\{[^{}]+\}"` used to locate JSON within conversational text is fragile. It cannot handle nested JSON objects (e.g., `{"a": {"b": 1}}`) because the `[^{}]+` pattern stops at the first nested brace. While the current prompts only expect flat JSON, if an LLM includes curly braces in its "explanation" (common in technical/coding domains), this regex will produce invalid JSON fragments. A more robust approach would be to find the first `{` and last `}` or use a balanced-brace parser.

### expert_build/exam.py:extract_answer
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: ADDRESSES
BELIEF_COMPLIANCE: CONSISTENT
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Correctly implements JSON extraction with a retry mechanism. The transition from regex-based extraction to JSON is handled gracefully by the retry logic, though it does assume the "answer" field will be a string (calling `.strip()` on a non-string would raise an error). The fallback to a truncated version of the raw response ensures the system doesn't crash on persistent parsing failures.

### expert_build/exam.py:judge_answer
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: ADDRESSES
BELIEF_COMPLIANCE: CONSISTENT
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Updated to handle JSON verdicts. The implementation of retry logic within this function is consistent with `extract_answer`. Case-insensitivity for the verdict ("CORRECT" vs "correct") is correctly handled.

### expert_build/prompts.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: ADDRESSES
BELIEF_COMPLIANCE: CONSISTENT
TEST_COVERAGE: N/A
INTEGRATION: WIRED
REASONING: Prompts have been updated to explicitly request JSON, which is necessary for the new parsing logic to function reliably.

### tests/test_exam.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: ADDRESSES
BELIEF_COMPLIANCE: CONSISTENT
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: This is a high-quality test file that covers various edge cases including code fences, embedded JSON, invalid JSON, and retry success/failure. The use of mocks for the LLM calls is appropriate and correctly implemented.

### SELF_REVIEW
LIMITATIONS: I was able to verify the internal logic of `exam.py` and its interaction with `prompts.py` and `llm.py`. I confirmed that all production callers of the modified functions within `exam.py` were updated. I did not have visibility into whether external consumers of the `reasons.db` expect specific formats in the "nogood" descriptions that might be affected by the raw response fallbacks.
---

### FEATURE_REQUESTS
- Improve `_extract_json` to handle nested braces. A simple loop finding the first `{` and last `}` is often more robust for LLM outputs than the current regex.
- Add type coercion to `extract_answer` before calling `.strip()` (e.g., `str(data["answer"]).strip()`) to handle cases where an LLM might return a number instead of a string for a choice.
