### expert_build/exam.py:extract_answer
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: The function now uses a structured approach by requesting JSON and implementing a retry mechanism on failure. This is more robust than the previous regex-based approach. While it removes some heuristic letter-finding regex, the retry mechanism and the fallback to the first 100 characters should maintain or improve overall reliability with modern LLMs. Integration in `cmd_exam` is correctly updated to pass the necessary retry context.
---

### expert_build/exam.py:judge_answer
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: This function follows the same robust JSON + retry pattern as `extract_answer`. It properly handles the extraction of both the verdict and the explanation from the JSON payload, providing a cleaner interface for judging semantic correctness of answers.
---

### expert_build/exam.py:_extract_json
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: A solid utility function for extracting JSON from LLM responses. It handles common variations such as markdown code fences and "conversational" prefix/suffix text by using a brace-finding fallback.
---

### expert_build/prompts.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: N/A
INTEGRATION: WIRED
REASONING: The prompts for both answering and judging are correctly updated to mandate JSON output. This is a necessary and consistent change to support the new parsing logic.
---

### tests/test_exam.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: This new test file provides comprehensive coverage for the JSON extraction, answer extraction, and judging logic. It correctly mocks `invoke_sync` to simulate LLM failures and successes, ensuring the retry logic is thoroughly verified.
---

### SELF_REVIEW
LIMITATIONS: No limitations. Full function bodies and all production callers were verified via observation results.
---
