### expert_build/cli.py:main
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: CONSISTENT
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: The cost summary reporting is correctly placed in a `finally` block to ensure it displays even after errors or interrupts. Using `sys.stderr` for the cost summary is appropriate for CLI tools to keep `stdout` clean for piped output.
---

### expert_build/llm.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: CONSISTENT
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: The transition to JSON output for cost tracking is well-implemented. Crucially, `_parse_cli_json` preserves the return type of `invoke` as a plain string (extracting the response text from the JSON structure), which avoids breaking existing callers that expect a raw string response. The cost accumulation logic handles both Gemini and Claude formats correctly, including Claude's prompt caching tokens.
---

### tests/test_llm.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: CONSISTENT
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: New tests provide comprehensive coverage for the JSON parsing logic, including various edge cases like non-JSON output, multi-model Gemini stats, and Claude's caching tokens. The use of a fixture to reset global state between tests ensures test isolation.
---

### SELF_REVIEW
LIMITATIONS: I am assuming the `claude` and `gemini` CLI tools support the added flags (`--output-format json` and `--skip-trust -o json` respectively) as indicated by the PR intent. I verified integration by checking the call sites of `invoke_sync` in the provided observations.
---

### FEATURE_REQUESTS
- It would be helpful to see the `extract_json` function in `expert_build/llm.py` if it was also modified, as several callers use it immediately after `invoke_sync`.
---
