Here's my review of the changes.

---

## Code Review: Switch propose and coverage to JSON pseudo-tool-calling

### expert_build/llm.py:extract_json
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Generalizes the former `_extract_json` from `exam.py` to also handle JSON arrays. The priority logic (array before object when `[` precedes `{`) is correct for the propose use case and harmlessly falls through to object extraction on parse failure. All three consumers (`exam`, `propose`, `coverage`) import from here. Tests cover plain, code-fenced, embedded-in-text, and invalid cases for both objects and arrays. `RETRY_JSON` constant co-located here is a good fit since every caller of `extract_json` also uses it.
---

### expert_build/llm.py:RETRY_JSON
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Moved from `exam.py` to the shared `llm.py` module. All three consumers import it. The observation data confirms zero stale references remain.
---

### expert_build/exam.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Straightforward migration — removes local `_extract_json` and `RETRY_JSON`, imports from `llm`. All call sites updated. No behavioral change to `extract_answer` or `judge_answer`. The `json` import removal is correct since `_extract_json` was the only consumer.
---

### expert_build/coverage.py:cmd_cert_coverage
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: The old text-line parsing (`result.strip().upper() != "NONE"`, split by newline) is replaced with structured JSON parsing of `{"matching_ids": [...]}`. Retry logic is correct: if the initial response isn't a dict with `matching_ids`, a retry is issued; if that also fails, the loop body silently produces no matches and falls through to keyword matching (line 132). The `str(bid).strip()` call at line 125 is a nice defensive touch against numeric or whitespace-padded IDs. Five tests cover the happy path, empty matches, retry, code fences, and invalid IDs.
---

### expert_build/propose.py:cmd_propose_beliefs
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Replaces fragile regex-based markdown heading parsing with `extract_json` returning a list. The retry-then-skip-batch pattern is clean — unparseable batches no longer write garbage to the output file (an improvement over the old behavior). The output file still uses the markdown `### [ACCEPT/REJECT]` format for human review, preserving the `accept-beliefs` workflow. Dedup filtering is now a simple dict key check instead of regex matching, which eliminates the class of bugs where LLM output subtly differed from the expected heading format. The `b.get("id", "unknown")` / `b.get("claim", "")` defaults are reasonable defensive coding against malformed LLM output.
---

### expert_build/prompts.py:PROPOSE_BELIEFS
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Removes the old markdown format instructions and replaces with a JSON array schema. The double-brace escaping (`{{`, `}}`) is correct for `.format()` templates. The schema example is clear and includes all fields the parser expects. The rules section is preserved above the entries.
---

### expert_build/prompts.py:CERT_MATCH
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Clean switch from "one per line, or NONE" to `{"matching_ids": [...]}` with empty-array example. The double-brace escaping is correct.
---

### tests/test_coverage.py (new file)
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Five well-structured tests covering the key scenarios: valid JSON matching, empty matches with fallback, retry on non-JSON, code fences, and invalid belief ID filtering. The `beliefs_db` fixture creates an empty file that passes the `Path(db_path).exists()` check while `load_beliefs` is mocked — this is correct. Minor note: the `beliefs_file` arg receives a `Path` object, which works because `cmd_cert_coverage` calls `str(args.beliefs_file)`.
---

### tests/test_exam.py (updated)
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Import updated from `_extract_json` to `extract_json` from `llm`. Three new array tests added (`test_extract_json_array`, `test_extract_json_array_in_text`, `test_extract_json_array_with_code_fence`). All existing tests preserved with updated function name. No behavioral changes to extract_answer/judge_answer tests.
---

### tests/test_propose.py (updated)
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: The `_json_beliefs` helper is clean and DRY. The old `test_filter_regex_matches_accept_reject_placeholder` was correctly replaced with `test_json_retry_on_bad_response` and `test_json_with_code_fence` — these test the new parse paths that replaced the old regex parsing. All existing batch/crash/dedup/append tests updated to use JSON responses. Test count maintained.
---

### SELF_REVIEW
LIMITATIONS: Could not run the test suite (permission not granted) to verify all tests pass. Review is based on code reading only. I also did not see the `accept-beliefs` command that consumes the markdown output to verify the output format is still compatible, though the format (`### [ACCEPT/REJECT] {bid}`) appears unchanged from what the old code produced.
---

**Summary:** This is a clean, well-tested refactoring. The `extract_json` consolidation eliminates duplication, the JSON parsing is more robust than the regex/text-line approach it replaces, and the retry-then-skip pattern handles edge cases gracefully. No bugs found. All callers updated, no stale references remain.
