## Code Review: `index-sources` command

### expert_build/cli.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: CLI argument parser and dispatch are correctly wired. The `_lazy` call follows the existing pattern (`_lazy("index_sources", "cmd_index_sources")`), matching the module name `index_sources.py` and the exported function. Argument definitions align with what `cmd_index_sources` reads via `getattr`/attribute access. The alphabetical insertion in the dispatch dict is consistent with the surrounding entries.
---

### expert_build/index_sources.py:_init_db
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Table creation and rebuild logic are correct. The FTS5 content-sync table (`content=chunks, content_rowid=id`) is properly configured. Drop order (FTS first, then content table) is correct for avoiding FK issues. `IF NOT EXISTS` guards make it safe for repeated calls without `rebuild`.
---

### expert_build/index_sources.py:_insert_chunks
VERDICT: CONCERN
CORRECTNESS: QUESTIONABLE
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: PARTIAL
INTEGRATION: WIRED
REASONING: Two issues: (1) `INSERT INTO chunks_fts(chunks_fts) VALUES('rebuild')` is called after every file's batch of chunks (line 53). This is an O(n) full-index rebuild command — calling it per-file makes the overall operation O(n*m) where m is total chunks. It should be called once at the end after all files are indexed, not per-file. For small corpora this is tolerable but it's a latent performance trap. (2) The function commits twice per call (lines 50 and 53) — the first commit is unnecessary since the second one covers both. Neither issue is a correctness bug, but the FTS rebuild-per-file is worth fixing.
---

### expert_build/index_sources.py:cmd_index_sources
VERDICT: CONCERN
CORRECTNESS: QUESTIONABLE
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Three observations: (1) The `conn.close()` on line 115 is not in a `try/finally` or context manager — if any file read or chunk call raises, the connection leaks. (2) The skip-if-already-indexed check compares `str(source_path)` against stored filenames, but `source_path` is a `Path` object from glob. On the first index run, `_insert_chunks` receives a `Path` object and `str(filename)` converts it (line 47). This works, but the stored path is absolute (from `tmp_path` in tests) or relative depending on what `--input-dir` was. If the user changes `--input-dir` from `./sources` to `sources` between runs, the skip logic silently re-indexes everything. This is a minor fragility. (3) The `getattr` calls (lines 72-74) are unnecessary — `argparse` guarantees these attributes exist on the namespace since they have defaults. Direct attribute access (`args.rebuild`, `args.type`, `args.chunk_size`) would be clearer and consistent with how `args.input_dir` and `args.db` are accessed on lines 69-70.
---

### tests/test_index_sources.py
VERDICT: PASS
CORRECTNESS: VALID
SPEC_COMPLIANCE: N/A
ISSUE_COMPLIANCE: N/A
BELIEF_COMPLIANCE: N/A
TEST_COVERAGE: COVERED
INTEGRATION: WIRED
REASONING: Good coverage across the feature surface: table creation, rebuild, markdown/python indexing, skip-already-indexed, recursive vs non-recursive, chunk type metadata, FTS5 search, large file chunking, and frontmatter URL extraction. The `test_large_file_chunked` test asserts exactly 2 chunks for a file with two H1 sections of ~3000 chars each with `chunk_size=2000`, which aligns with `chunk_markdown` splitting on `#{1,2}` boundaries. The `work_dir` fixture with `monkeypatch.chdir` properly isolates tests. One minor gap: no test for `.txt` files hitting the `chunk_fixed` path, and no test for empty files being skipped (line 97 `if not content.strip(): continue`), but these are low-risk paths.
---

## Summary

The implementation is solid and well-tested. Two items worth addressing:

1. **Performance**: Move the `INSERT INTO chunks_fts(chunks_fts) VALUES('rebuild')` call out of `_insert_chunks` and into `cmd_index_sources` after the loop. Currently it rebuilds the entire FTS index once per file.

2. **Resource safety**: Wrap the connection in `cmd_index_sources` with `try/finally` or use `conn` as a context manager to ensure cleanup on exceptions.

The unnecessary `getattr` calls are a minor style issue — not worth blocking on but easy to clean up.

### SELF_REVIEW
LIMITATIONS: Could not verify whether `chunk_markdown` with `max_chars=2000` reliably produces exactly 2 chunks for the test's input (two ~3000-char sections) — the chunking functions use the default `max_chars=25000` signature, and the test passes `chunk_size=2000` through `args` which is correctly forwarded. Also could not verify whether other commands in the project use a similar FTS pattern to check for consistency.
---

### FEATURE_REQUESTS
- Show the full module's `__init__.py` or package exports to verify the new module is importable without extra registration
- When a function is called in a loop, flag it if it contains operations that should be hoisted out (like the FTS rebuild)
---
