## Code Review

### expert_build/propose.py:cmd_propose_beliefs
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: The refactoring moves from "accumulate all proposals then write" to "write each batch immediately." This is a sound change for crash resilience. Key correctness points:

1. **`appended` flag** is computed before the header write, so the final print message is correct.
2. **`successful_entries` accumulation** — `_save_processed` is called with the growing list each iteration. This re-hashes all prior entries on each call (O(n²) in entry count), but is correct since `_save_processed` starts from the original `processed` dict and merges all successful entries. The redundant hashing is negligible for typical batch sizes.
3. **`skipped` counter** is correctly scoped per-batch and accumulated into `total_skipped`.
4. **Regex fix** — the old regex `\[?(?:ACCEPT|REJECT)\]?` fails on `[ACCEPT/REJECT]` because after matching `ACCEPT`, the `\]?` doesn't match `/` and the required space can't match `/REJECT]`. The new regex with the `\[ACCEPT/REJECT\]` alternation fixes this correctly.
5. **`batch_paths[i]` indexing** — `i` from `enumerate(batches)` always aligns with `batch_paths` since both lists are built from the same loop.
6. Callers in `cli.py` and `pipeline.py` are unaffected — the function signature is unchanged.

Minor nit: when all proposals in a batch are filtered out, the code appends empty `\n\n` to the file. Cosmetic only.

---

### tests/test_propose.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Six well-structured tests covering the key behavioral changes:

- **Crash resilience** (`test_proposals_written_after_each_batch`): Verifies batch 1 proposals survive a batch 2 crash. The `call_count` trick correctly maps to batch ordering.
- **Happy path** (`test_all_batches_written_on_success`): Both batches' beliefs appear in output.
- **Dedup filtering** (`test_existing_beliefs_filtered_per_batch`): Existing belief IDs are stripped, new ones kept.
- **Processed tracking** (`test_failed_batch_entries_not_marked_processed`): Failed batch entries aren't marked processed — verified via JSON deserialization of the processed file. Entry sort order (alphabetical) ensures batch 1 = entry0+entry1, batch 2 = entry2+entry3.
- **Regex fix** (`test_filter_regex_matches_accept_reject_placeholder`): Directly tests the `[ACCEPT/REJECT]` placeholder dedup.
- **Append mode** (`test_appends_to_existing_output_file`): Pre-existing content preserved when appending.

The mocking strategy is consistent — `check_model_available`, `invoke_sync`, `_load_existing_beliefs`, and `_has_embeddings` are all patched. The `make_args` helper doesn't set an `entry` attribute, which correctly exercises the directory-scan path (`hasattr(args, 'entry')` returns False).

Not covered but pre-existing gaps: `--entry` flag, `--all` flag, embedding-based dedup, all-batches-fail scenario. These aren't regressions from this change.

---

### SELF_REVIEW
LIMITATIONS: Could not verify `invoke_sync` is imported from `.llm` and its actual exception behavior (the observation reported "Function not found"), so I'm trusting the mocks match real behavior. Could not see the `_filter_unprocessed` function to verify it works correctly with the paths produced by `successful_entries.extend(Path(p) for p in batch_paths[i])`. Could not run the tests to confirm they pass.

---

### FEATURE_REQUESTS
- Include ability to run the test suite and report pass/fail as part of the review observations
- Show `git log --oneline` for the branch to understand the commit progression and whether earlier commits in the PR affect the review

---
