## Code Review: `index-sources` command

### expert_build/cli.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: CLI argument parser and dispatch are correctly wired. The `_lazy` call points to `index_sources` module and `cmd_index_sources` function, which matches the new module. Arguments are well-defined with sensible defaults. The `--type` choices match what the schema supports.
---

### expert_build/index_sources.py:_init_db
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Table creation is correct. The `IF NOT EXISTS` guards make it idempotent. On rebuild, tables are dropped in the right order (FTS virtual table first, then content table). The FTS5 content-sync setup (`content=chunks, content_rowid=id`) is properly configured.
---

### expert_build/index_sources.py:_insert_chunks
VERDICT: CONCERN
CORRECTNESS: QUESTIONABLE
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: PARTIAL
INTEGRATION: WIRED
REASONING: The function inserts into the `chunks` content table but does not insert into `chunks_fts`. Because the FTS5 table is configured as a content-sync table (`content=chunks`), inserts to `chunks` do **not** automatically populate `chunks_fts` — that requires either FTS5 triggers or manual `INSERT INTO chunks_fts(rowid, text) VALUES(...)` statements, or a `rebuild` command afterward. The code relies on the `rebuild` command at the end of `cmd_index_sources` (line 112) to sync the FTS index, which works but means the FTS index is only valid after the full rebuild call. This is fine for the current usage pattern (batch insert then rebuild once), but `_insert_chunks` as a standalone function is misleading — callers must know to rebuild FTS afterward. Not a bug in practice since the only caller does rebuild, but worth documenting or restructuring.
---

### expert_build/index_sources.py:cmd_index_sources
VERDICT: CONCERN
CORRECTNESS: QUESTIONABLE
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Two issues:

1. **File encoding errors** (line 90): `source_path.read_text()` uses the platform default encoding and will raise `UnicodeDecodeError` on binary or non-UTF-8 files. No error handling exists, which will abort the entire indexing run. A `try/except` around the read with a skip+warning would be more robust.

2. **Duplicate chunks on re-run with new files**: When `rebuild=False` and some files are already indexed, the skip-existing logic (line 86) uses `str(source_path)` which is a relative path. If the working directory changes between runs, the same file gets a different string representation and will be re-indexed as a duplicate. The test `test_skips_already_indexed` only tests same-cwd re-runs, so this edge case isn't covered.

The `try/finally` ensuring `conn.close()` is good. The commit-then-rebuild-then-commit pattern is correct for content-sync FTS5.
---

### tests/test_index_sources.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Good coverage across the main features: markdown/python indexing, skip-already-indexed, rebuild, recursive/non-recursive, chunk type metadata, FTS5 search, large file chunking, and source_url extraction from frontmatter. The `test_large_file_chunked` test asserts exactly 2 chunks, which is correct given how `chunk_markdown` splits on `#` headings. The FTS5 search test (`test_fts5_search_works`) validates end-to-end that the content-sync rebuild works. Missing: no test for `.txt` files, no test for empty files, no test for files with frontmatter but no content, no test for `sys.exit(1)` when input dir doesn't exist.
---

### SELF_REVIEW
LIMITATIONS: Could not verify whether the `chunk_markdown` default `max_chars=25000` vs the CLI default `--chunk-size 2000` causes unexpected behavior — the chunkers' merging logic uses `max_chars` as a ceiling, so passing 2000 should work but may produce many small chunks for heading-heavy documents. Could not see the full `chunk_docs.py` `main` or other callers to verify the 2000 default is intentional vs the 25000 default in the chunkers themselves.
---

### FEATURE_REQUESTS
- Show the full source file for new files (not just the diff) when reviewing new modules — the diff and full file are identical for new files but having line numbers aids reference
- Include observation of existing tests for imported modules (e.g., `test_chunk_docs.py` if it exists) to check for overlapping coverage
---
