# Ralph Progress Log

## Codebase Patterns

(Add reusable patterns discovered during iterations here)

---

## Log

(Iteration entries will be appended below)

---

## 2026-02-08 - US-001

**Story:** Add --no-fix flag to review commands

### What was implemented
- Added `--no-fix` boolean flag (default: False) to the `loop()` command in `src/ralph/commands/loop.py`
- Added `--no-fix` boolean flag (default: False) to the `review()` command in `src/ralph/commands/review.py`
- Threaded `no_fix` parameter through `_run_review_loop()` in `loop.py` (signature + call site)
- Help text: "Report review findings without applying automated fixes"

### Tests written
- No new tests needed for this story (flag plumbing only; behavior implementation is US-002)
- All 217 existing tests pass

### Files changed
- `src/ralph/commands/loop.py` - added `no_fix` param to `loop()` and `_run_review_loop()`
- `src/ralph/commands/review.py` - added `no_fix` param to `review()`
- `plans/TASKS.json` - marked US-001 as passes: true

### Learnings for future iterations
- Typer `--no-fix` automatically maps to `no_fix` Python parameter via Typer's flag naming convention
- The `_run_review_loop()` function in loop.py is the bridge between the loop command and the review pipeline; subsequent stories (US-002, US-003) will wire no_fix into ReviewLoopService

---

[Ralph Loop] 2026-02-08 05:55 UTC - Iteration 1: US-001 (Add --no-fix flag to review commands) completed

## 2026-02-08 - US-002

**Story:** Implement --no-fix behavior in ReviewLoopService

### What was implemented
- Added `no_fix` keyword parameter (default: False) to `ReviewLoopService.run_review_loop()` method signature
- When `no_fix=True`, the `FixLoopService.run_fix_loop()` call is skipped after a NEEDS_WORK verdict
- Logs `[Fix] Skipped (--no-fix)` via `logger.info()` when a fix is skipped
- Continues to the next reviewer after logging (pipeline is not aborted)

### Tests written
- `test_no_fix_skips_fix_loop_on_needs_work` - verifies FixLoopService is not instantiated when no_fix=True
- `test_no_fix_logs_skip_message` - verifies '[Fix] Skipped (--no-fix)' appears in log records
- `test_no_fix_continues_to_next_reviewer` - verifies both reviewers in a 2-reviewer pipeline execute when no_fix=True
- `test_no_fix_false_still_runs_fix_loop` - verifies default behavior (no_fix=False) still invokes fix loop

### Files changed
- `src/ralph/services/review_loop.py` - added `no_fix` param to `run_review_loop()`, conditional skip of fix loop
- `tests/test_review_loop.py` - added 4 new tests, `logging` and `pytest` imports
- `plans/TASKS.json` - marked US-002 as passes: true

### Learnings for future iterations
- The commands (loop.py, review.py) currently iterate reviewers manually rather than calling `run_review_loop()`. Future stories (US-003) that add no_fix summary output to commands will need to handle this in both places.
- `caplog` fixture with `caplog.at_level(logging.INFO)` is the clean way to assert log messages in tests.

---

[Ralph Loop] 2026-02-08 05:58 UTC - Iteration 2: US-002 (Implement --no-fix behavior in ReviewLoopService) completed

## 2026-02-08 - US-003

**Story:** Add --no-fix summary to review output

### What was implemented
- Added `fix_skipped` field (default: False) to `ReviewerResult` NamedTuple in `review_loop.py`
- In `ReviewLoopService.run_review_loop()`, set `fix_skipped=True` on results when `no_fix=True` and reviewer has NEEDS_WORK findings eligible for fix loop
- Updated summary display in `_run_review_loop()` in `loop.py` to show `[FINDINGS] reviewer: findings (not fixed)` for fix-skipped reviewers
- Updated summary display in `review()` in `review.py` with the same logic
- Both summaries include `Findings (not fixed): N` count in the summary line when skipped_fix > 0
- Both commands detect `no_fix` + NEEDS_WORK + eligible fix loop and mark results with `fix_skipped=True`

### Tests written
- `TestNoFixSummary::test_no_fix_sets_fix_skipped_on_needs_work_result` - verifies fix_skipped=True on NEEDS_WORK results in run_review_loop
- `TestNoFixSummary::test_no_fix_does_not_set_fix_skipped_on_passed_result` - verifies fix_skipped=False on PASSED results
- `TestNoFixSummary::test_no_fix_false_does_not_set_fix_skipped` - verifies default behavior leaves fix_skipped=False
- `TestNoFixSummary::test_no_fix_mixed_results_only_marks_needs_work` - verifies only NEEDS_WORK reviewers get fix_skipped
- `TestReviewNoFixSummary::test_no_fix_shows_findings_not_fixed_in_summary` - verifies CLI output shows "findings (not fixed)"
- `TestReviewNoFixSummary::test_no_fix_includes_skipped_fix_count_in_summary_line` - verifies "Findings (not fixed): 1" in summary line
- `TestReviewNoFixSummary::test_no_fix_not_set_does_not_show_findings_not_fixed` - verifies no spurious output without --no-fix

### Files changed
- `src/ralph/services/review_loop.py` - added `fix_skipped` field to ReviewerResult, set it in run_review_loop()
- `src/ralph/commands/loop.py` - added Verdict import, no_fix-aware result marking, updated summary display
- `src/ralph/commands/review.py` - added Verdict import, no_fix-aware result marking, updated summary display
- `tests/test_review_loop.py` - added TestNoFixSummary class with 4 tests
- `tests/test_review_command.py` - added TestReviewNoFixSummary class with 3 tests, updated _make_mock_service()
- `plans/TASKS.json` - marked US-003 as passes: true

### Learnings for future iterations
- Both `loop.py::_run_review_loop()` and `review.py::review()` manually iterate reviewers rather than calling `ReviewLoopService.run_review_loop()`, so changes to summary display must be made in both places
- NamedTuple fields with defaults must come after fields without defaults; `fix_skipped: bool = False` works at the end
- When using MagicMock for ReviewerResult in integration tests, auto-generated attributes (like fix_skipped) return MagicMock objects which are truthy — switching to real ReviewerResult instances avoids this

---

[Ralph Loop] 2026-02-08 06:04 UTC - Iteration 3: US-003 (Add --no-fix summary to review output) completed

## 2026-02-08 - US-004

**Story:** Create ReviewState Pydantic model

### What was implemented
- Created `ReviewState` Pydantic model in `src/ralph/models/review_state.py` with fields: `reviewer_names`, `completed`, `timestamp`, `config_hash`
- Added `compute_config_hash()` static method that produces a deterministic SHA-256 hash from a list of `ReviewerConfig` objects (captures name, skill, level, sorted languages)
- Added `save()` instance method to write state to a JSON file
- Added `load()` class method to read state from a JSON file (returns `None` for missing/invalid files)
- Added `REVIEW_STATE_FILENAME` constant (`.ralph-review-state.json`)
- Exported `ReviewState` and `REVIEW_STATE_FILENAME` from `src/ralph/models/__init__.py`

### Tests written
- `TestReviewStateModel::test_create_review_state` - verifies model creation with all fields
- `TestReviewStateModel::test_completed_defaults_to_empty_dict` - verifies default factory
- `TestReviewStateModel::test_serialization_round_trip` - verifies JSON serialization/deserialization
- `TestComputeConfigHash::test_hash_is_deterministic` - same config produces same hash
- `TestComputeConfigHash::test_hash_is_hex_string` - valid 64-char hex digest
- `TestComputeConfigHash::test_hash_changes_with_different_config` - different configs differ
- `TestComputeConfigHash::test_hash_changes_when_level_changes` - level change detected
- `TestComputeConfigHash::test_hash_empty_list` - empty list produces valid hash
- `TestComputeConfigHash::test_hash_language_order_irrelevant` - language list order normalized
- `TestSaveAndLoad::test_save_creates_json_file` - file written with correct content
- `TestSaveAndLoad::test_load_valid_file` - valid file loaded correctly
- `TestSaveAndLoad::test_load_nonexistent_returns_none` - missing file returns None
- `TestSaveAndLoad::test_load_invalid_json_returns_none` - malformed JSON returns None
- `TestSaveAndLoad::test_load_invalid_schema_returns_none` - wrong schema returns None
- `TestSaveAndLoad::test_save_load_round_trip` - save and load are inverse operations
- `TestReviewStateFilename::test_filename_value` - constant has expected value

### Files changed
- `src/ralph/models/review_state.py` - new file, ReviewState model
- `src/ralph/models/__init__.py` - added ReviewState and REVIEW_STATE_FILENAME exports
- `tests/test_review_state.py` - new file, 16 tests
- `plans/TASKS.json` - marked US-004 as passes: true

### Learnings for future iterations
- The manifest model (`manifest.py`) uses `by_alias=True` for camelCase JSON, but ReviewState uses snake_case throughout since `.ralph-review-state.json` is internal-only
- `compute_config_hash` sorts languages to ensure order-independence; this is important since `ReviewerConfig.languages` is a plain list
- Follow the `load`/`save` pattern from `manifest.py` for consistency: `load` returns `None` on failure, `save` writes with trailing newline

---

[Ralph Loop] 2026-02-08 - Iteration 4: US-004 (Create ReviewState Pydantic model) completed

[Ralph Loop] 2026-02-08 06:07 UTC - Iteration 4: US-004 (Create ReviewState Pydantic model) completed

## 2026-02-08 - US-005

**Story:** Implement resumable review logic in review commands

### What was implemented
- Added `--resume-review` boolean flag (default: False) to the `loop()` command in `src/ralph/commands/loop.py`
- Added `--resume-review` boolean flag (default: False) to the `review()` command in `src/ralph/commands/review.py`
- Threaded `resume_review` parameter through `_run_review_loop()` in `loop.py`
- Both commands load `ReviewState` from `.ralph-review-state.json` when `--resume-review` is set
- When state exists and config hash matches, already-completed reviewers are skipped
- When no state file exists or config hash mismatches, the full pipeline runs from the beginning
- State file is written/updated after each reviewer completes (via `ReviewState.save()`)

### Tests written
- `TestReviewResumeReview::test_resume_review_skips_completed_reviewers` - verifies completed reviewers from state file are skipped
- `TestReviewResumeReview::test_resume_review_runs_full_pipeline_without_state_file` - verifies full pipeline runs when no state exists
- `TestReviewResumeReview::test_resume_review_saves_state_after_each_reviewer` - verifies state file is created and updated
- `TestReviewResumeReview::test_resume_review_config_hash_mismatch_runs_full_pipeline` - verifies stale state is discarded

### Files changed
- `src/ralph/commands/loop.py` - added `resume_review` param to `loop()` and `_run_review_loop()`, added resume/state-save logic
- `src/ralph/commands/review.py` - added `resume_review` param to `review()`, added resume/state-save logic
- `tests/test_review_command.py` - added `TestReviewResumeReview` class with 4 tests
- `plans/TASKS.json` - marked US-005 as passes: true

### Learnings for future iterations
- The resume logic is implemented directly in both `_run_review_loop()` (loop.py) and `review()` (review.py) since both have their own reviewer iteration loops
- State file uses the existing `ReviewState` model from US-004 with `compute_config_hash()` for validation
- Config hash mismatch causes state to be discarded silently (logged but not errored)

---

[Ralph Loop] 2026-02-08 06:11 UTC - Iteration 5: US-005 (Implement resumable review logic in review commands) completed

## 2026-02-08 - US-006

**Story:** Handle state file cleanup and config invalidation

### What was implemented
- Added state file cleanup (`.ralph-review-state.json` deletion) after review loop completes in both `review.py` and `loop.py`'s `_run_review_loop()`
- Cleanup runs regardless of whether `--resume-review` was used — any leftover state file is removed
- Added visible console message when state is discarded due to config hash mismatch: "Review state discarded: reviewer configuration has changed"
- The config mismatch message appears in both `review.py` and `_run_review_loop()` in `loop.py`

### Tests written
- `TestReviewStateCleanup::test_state_file_cleaned_up_on_successful_completion` - verifies state file is deleted after all reviewers pass with `--resume-review`
- `TestReviewStateCleanup::test_state_file_cleaned_up_without_resume_flag` - verifies cleanup happens even without `--resume-review` flag
- `TestReviewStateCleanup::test_no_error_when_no_state_file_to_clean` - verifies no error when no state file exists to clean
- `TestReviewConfigInvalidation::test_config_change_logs_discard_message` - verifies console output includes "reviewer configuration has changed"
- `TestReviewConfigInvalidation::test_config_change_runs_all_reviewers_from_scratch` - verifies all reviewers run when config hash mismatches
- Updated `TestReviewResumeReview::test_resume_review_saves_state_after_each_reviewer` to account for post-completion cleanup

### Files changed
- `src/ralph/commands/review.py` - added state file cleanup on completion, added console message on config hash mismatch
- `src/ralph/commands/loop.py` - added state file cleanup in `_run_review_loop()`, added console message on config hash mismatch
- `tests/test_review_command.py` - added `TestReviewStateCleanup` (3 tests) and `TestReviewConfigInvalidation` (2 tests), updated 1 existing test
- `plans/TASKS.json` - marked US-006 as passes: true

### Learnings for future iterations
- When adding cleanup logic that removes files created during a loop, existing tests that assert file existence after the loop need to be updated to track intermediate writes (e.g., spy on `save()` calls)
- The `patch.object(ReviewState, "save", tracking_save)` pattern is useful for verifying intermediate save calls when the file gets cleaned up at the end

---

[Ralph Loop] 2026-02-08 06:15 UTC - Iteration 6: US-006 (Handle state file cleanup and config invalidation) completed

## 2026-02-08 - US-007

**Story:** Add .ralph-review-state.json to init scaffold

### What was implemented
- Added `create_gitignore()` method to `ScaffoldService` that creates or appends to `.gitignore`
- The method adds `.ralph-review-state.json` entry (using `REVIEW_STATE_FILENAME` constant)
- If `.gitignore` already exists, the entry is appended only if not already present
- Handles edge cases: missing trailing newline, empty file, duplicate prevention
- Updated `scaffold_all()` to call `create_gitignore()` and include `"gitignore"` in the result dict
- Imported `REVIEW_STATE_FILENAME` from `ralph.models.review_state`

### Tests written
- `TestCreateGitignore::test_creates_gitignore_when_not_exists` - verifies .gitignore is created with the entry
- `TestCreateGitignore::test_appends_to_existing_gitignore` - verifies entry is appended when .gitignore exists
- `TestCreateGitignore::test_does_not_duplicate_entry` - verifies no duplicate when entry already present
- `TestCreateGitignore::test_appends_with_newline_when_missing_trailing_newline` - verifies newline is added before entry
- `TestCreateGitignore::test_handles_empty_gitignore` - verifies entry is added to empty .gitignore
- `TestCreateGitignore::test_returns_gitignore_path` - verifies return value
- `TestScaffoldAllGitignore::test_scaffold_all_creates_gitignore` - verifies scaffold_all includes gitignore
- `TestScaffoldAllGitignore::test_scaffold_all_preserves_existing_gitignore` - verifies existing entries are preserved

### Files changed
- `src/ralph/services/scaffold.py` - added `create_gitignore()` method, updated `scaffold_all()`
- `tests/test_scaffold.py` - new file with 8 tests
- `plans/TASKS.json` - marked US-007 as passes: true

### Learnings for future iterations
- The `REVIEW_STATE_FILENAME` constant from `ralph.models.review_state` is reused to keep the gitignore entry in sync with the actual filename
- When appending to files, check for trailing newline to avoid malformed entries
- Line-based comparison (`splitlines()`) is more reliable than substring search for gitignore deduplication

---

[Ralph Loop] 2026-02-08 - Iteration 7: US-007 (Add .ralph-review-state.json to init scaffold) completed

[Ralph Loop] 2026-02-08 06:18 UTC - Iteration 7: US-007 (Add .ralph-review-state.json to init scaffold) completed

## 2026-02-08 - US-008

**Story:** Add codebase context to ralph tasks prompt

### What was implemented
- Added `_iter_file_tree()` generator function that walks the project directory, excluding common non-source dirs (.git, .venv, __pycache__, node_modules, etc.) with configurable max depth
- Added `_gather_codebase_summary()` function that produces a budget-capped summary containing a file tree and key file contents (pyproject.toml, CLAUDE.md, README.md, etc.)
- Extended `_build_prompt_from_skill()` to call `_gather_codebase_summary(Path.cwd())` and include the result in the prompt
- Added "Instructions for Already-Implemented Detection" section to the prompt instructing Claude to mark stories with `passes: true` when evidence exists in the codebase
- All codebase scanning is pure filesystem I/O — zero additional Claude API calls

### Tests written
- `TestIterFileTree` (5 tests): files/dirs listing, .git/.venv exclusion, max_depth respect, empty dir, egg-info exclusion
- `TestGatherCodebaseSummary` (6 tests): file tree section, pyproject.toml content, README content, missing key files, large file truncation, return type
- `TestBuildPromptIncludesCodebaseContext` (5 tests): codebase summary presence, already-implemented instructions, spec content preserved, branch name, file tree from cwd
- `TestCodebaseContextIntegration` (1 test): end-to-end tasks command sends codebase context to Claude

### Files changed
- `src/ralph/commands/tasks.py` - added _EXCLUDED_DIRS, _KEY_FILES, _MAX_SUMMARY_CHARS, _MAX_TREE_DEPTH constants; added _iter_file_tree(), _gather_codebase_summary(); modified _build_prompt_from_skill()
- `tests/test_tasks_codebase_context.py` - new file with 17 tests
- `plans/TASKS.json` - marked US-008 as passes: true

### Learnings for future iterations
- Using `Iterator[str]` yield pattern for file tree keeps memory usage low for large directories
- Budget-capping the codebase summary prevents oversized prompts; 12K chars is a reasonable limit
- The `_iter_file_tree` function sorts entries with directories first (using `p.is_file()` as sort key) for a cleaner tree display

---

[Ralph Loop] 2026-02-08 - Iteration 8: US-008 (Add codebase context to ralph tasks prompt) completed

[Ralph Loop] 2026-02-08 06:22 UTC - Iteration 8: US-008 (Add codebase context to ralph tasks prompt) completed

## 2026-02-08 - US-009

**Story:** Parse and log already-implemented story detection

### What was implemented
- Added `_log_already_implemented()` function to `src/ralph/commands/tasks.py`
- After parsing TASKS.json output from Claude, counts stories where `passes` is `True`
- Logs `[Tasks] N stories detected as already implemented` to the console
- For each already-implemented story, displays the story ID and title
- Notes on already-implemented stories are retained (function is read-only on the model)
- Called from `tasks()` after `save_tasks()` and before the "Generated N stories" message

### Tests written
- `TestLogAlreadyImplemented::test_logs_count_when_stories_detected` - verifies count in summary line (2 of 3 stories)
- `TestLogAlreadyImplemented::test_logs_each_story_id_and_title` - verifies US-001/US-002 IDs and titles appear
- `TestLogAlreadyImplemented::test_no_output_when_no_stories_detected` - verifies no output when all stories have passes=False
- `TestLogAlreadyImplemented::test_notes_are_retained_on_already_implemented_stories` - verifies notes are not mutated
- `TestLogAlreadyImplemented::test_summary_format_matches_spec` - verifies `[Tasks]` prefix format

### Files changed
- `src/ralph/commands/tasks.py` - added `_log_already_implemented()` function and call site in `tasks()`
- `tests/test_tasks_codebase_context.py` - added `TestLogAlreadyImplemented` class with 5 tests, added top-level imports
- `plans/TASKS.json` - marked US-009 as passes: true

### Learnings for future iterations
- Rich console output uses `capsys` for testing (not `caplog`) since `console.print()` writes to stdout
- The `\\[` escape is needed in Rich markup to display a literal `[` bracket (e.g., `\\[Tasks]`)

---

[Ralph Loop] 2026-02-08 - Iteration 9: US-009 (Parse and log already-implemented story detection) completed

[Ralph Loop] 2026-02-08 06:25 UTC - Iteration 9: US-009 (Parse and log already-implemented story detection) completed

## 2026-02-08 - US-010

**Story:** Bump version to 2.1.0 and update CHANGELOG

### What was implemented
- Updated version from `2.0.10` to `2.1.0` in `pyproject.toml`
- Updated version from `2.0.10` to `2.1.0` in `src/ralph/__init__.py`
- Added `[2.1.0] - 2026-02-08` changelog entry with sections for all three features:
  - `--no-fix` flag for audit-mode reviews (US-001 through US-003)
  - Resumable review pipeline with `--resume-review` (US-004 through US-007)
  - Already-implemented story detection in `ralph tasks` (US-008 through US-009)

### Tests written
- No new tests needed (version bump and changelog are documentation-only changes)
- All 283 existing tests pass

### Files changed
- `pyproject.toml` - version bumped to 2.1.0
- `src/ralph/__init__.py` - version bumped to 2.1.0
- `CHANGELOG.md` - added [2.1.0] section
- `plans/TASKS.json` - marked US-010 as passes: true

### Learnings for future iterations
- Version must be updated in both `pyproject.toml` and `src/ralph/__init__.py` to stay in sync
- Changelog entries should summarize the feature from a user perspective, not list individual stories

---

[Ralph Loop] 2026-02-08 06:26 UTC - Iteration 10: US-010 (Bump version to 2.1.0 and update CHANGELOG) completed

[Review] 2026-02-08 06:38 UTC - test-quality (blocking)

### Verdict: PASSED (all findings fixed)

### Findings

1. **TQ-001**: Redundant Test - Duplicate --no-fix summary tests across test_nofix_and_resume.py and test_review_command.py
   - File: tests/test_nofix_and_resume.py:227
   - Issue: `TestNoFixSummaryReviewCommand` (3 tests at lines 227-324) duplicates `TestReviewNoFixSummary` in test_review_command.py (3 tests at lines 339-465). Both test the same CLI output behavior through the review command with near-identical mock setups and assertions (e.g., checking "findings (not fixed)" in output, checking "Findings (not fixed): 2" count, and checking absence without --no-fix).
   - Suggestion: Remove `TestNoFixSummaryReviewCommand` from test_nofix_and_resume.py since test_review_command.py already provides comprehensive coverage of this behavior. The nofix_and_resume file should focus on the service-layer and loop.py tests that aren't covered elsewhere.

2. **TQ-002**: Redundant Test - Duplicate resume-review tests across test_nofix_and_resume.py and test_review_command.py
   - File: tests/test_nofix_and_resume.py:530
   - Issue: `TestResumeSkipsCompletedReviewCommand` (3 tests at lines 530-616) duplicates `TestReviewResumeReview` in test_review_command.py (4 tests at lines 468-559). Both test --resume-review skipping completed reviewers, running all when no state exists, and saving state after each reviewer, using nearly identical mock setups through the review command CLI.
   - Suggestion: Remove `TestResumeSkipsCompletedReviewCommand` from test_nofix_and_resume.py. The test_review_command.py version has better coverage (4 tests vs 3, including config hash mismatch).

3. **TQ-003**: Redundant Test - Duplicate state cleanup tests across test_nofix_and_resume.py and test_review_command.py
   - File: tests/test_nofix_and_resume.py:665
   - Issue: `TestStateCleanupReviewCommand` (3 tests at lines 665-744) duplicates `TestReviewStateCleanup` in test_review_command.py (3 tests at lines 594-682). Both test identical scenarios: cleanup with --resume-review, cleanup without the flag, and no error when state file doesn't exist.
   - Suggestion: Remove `TestStateCleanupReviewCommand` from test_nofix_and_resume.py since test_review_command.py already covers these scenarios identically.

4. **TQ-004**: Redundant Test - Duplicate config invalidation tests across test_nofix_and_resume.py and test_review_command.py
   - File: tests/test_nofix_and_resume.py:791
   - Issue: `TestConfigInvalidationReviewCommand` (2 tests at lines 791-849) duplicates `TestReviewConfigInvalidation` in test_review_command.py (2 tests at lines 685-746). Both test that stale config hash runs all reviewers and that a discard message is logged.
   - Suggestion: Remove `TestConfigInvalidationReviewCommand` from test_nofix_and_resume.py. The loop.py-specific tests in `TestConfigInvalidationInLoopRunReviewLoop` are the unique value this file adds.

5. **TQ-005**: Redundant Test - Duplicate ReviewState serialization tests across test_nofix_and_resume.py and test_review_state.py
   - File: tests/test_nofix_and_resume.py:384
   - Issue: `TestReviewStateSerialisation` (4 tests at lines 384-452) overlaps significantly with `TestSaveAndLoad` and `TestReviewStateModel` in test_review_state.py. For example, `test_round_trip_preserves_all_fields` (line 387) duplicates `test_save_load_round_trip` (test_review_state.py:227) and `test_serialization_round_trip` (test_review_state.py:62). Similarly, `test_load_returns_none_for_truncated_json` duplicates `test_load_invalid_json_returns_none`.
   - Suggestion: Remove `TestReviewStateSerialisation` from test_nofix_and_resume.py. The test_review_state.py file provides more thorough coverage of the ReviewState model with 16 dedicated tests.

6. **TQ-006**: Redundant Test - Duplicate config hash tests across test_nofix_and_resume.py and test_review_state.py
   - File: tests/test_nofix_and_resume.py:455
   - Issue: `TestReviewStateConfigHash` (6 tests at lines 455-522) overlaps with `TestComputeConfigHash` in test_review_state.py (6 tests at lines 80-165). Both test determinism, sensitivity to config changes, and language order normalization. The nofix_and_resume version adds `test_hash_differs_when_name_changes` and `test_hash_none_languages_vs_empty_list`, but the core tests are duplicated.
   - Suggestion: Move the 2 unique tests (`test_hash_differs_when_name_changes` and `test_hash_none_languages_vs_empty_list`) to test_review_state.py's `TestComputeConfigHash` class, then remove `TestReviewStateConfigHash` from test_nofix_and_resume.py.

7. **TQ-007**: Framework Test - Testing a constant's string value
   - File: tests/test_review_state.py:252
   - Issue: `test_filename_value` asserts `REVIEW_STATE_FILENAME == ".ralph-review-state.json"`. This tests that a Python constant equals its defined value, which is testing the language/framework behavior rather than business logic. This test cannot catch a real bug — if the constant value were intentionally changed, the test would just need updating.
   - Suggestion: Remove `TestReviewStateFilename` class. The constant's value is already exercised by save/load tests that use it.

---

## 2026-02-08 - US-011

**Story:** Add unit tests for --no-fix and resumable review features

### What was implemented
- Created comprehensive test file `tests/test_nofix_and_resume.py` with 30 tests across 12 test classes
- Tests cover both `review.py` and `loop.py` code paths (gap identified: `_run_review_loop()` in loop.py was not previously tested for new features)
- Organised tests by acceptance criteria into 6 sections matching the story requirements

### Tests written
- **TestNoFixSkipsFixLoopInService** (4 tests): FixLoopService not instantiated with no_fix=True, multiple NEEDS_WORK reviewers, default behavior still invokes fix, log messages per reviewer
- **TestNoFixSummaryReviewCommand** (3 tests): reviewer name in summary, multiple skipped-fix names, absence without --no-fix
- **TestNoFixSummaryLoopRunReviewLoop** (1 test): _run_review_loop shows "findings (not fixed)" with reviewer name
- **TestReviewStateSerialisation** (4 tests): round-trip preserves fields, truncated JSON returns None, save overwrites, valid JSON keys
- **TestReviewStateConfigHash** (6 tests): deterministic, skill change, name change, reviewer added, language order normalised, None vs empty languages
- **TestResumeSkipsCompletedReviewCommand** (3 tests): only remaining reviewers run, no state runs all, state saved after each
- **TestResumeInLoopRunReviewLoop** (1 test): _run_review_loop skips completed reviewers
- **TestStateCleanupReviewCommand** (3 tests): cleanup with resume, cleanup without resume, no error on missing file
- **TestStateCleanupInLoopRunReviewLoop** (1 test): state file removed after completion
- **TestConfigInvalidationReviewCommand** (2 tests): stale hash runs all reviewers, stale hash logs message
- **TestConfigInvalidationInLoopRunReviewLoop** (2 tests): stale hash runs all, stale hash console message

### Files changed
- `tests/test_nofix_and_resume.py` - new file, 30 tests
- `plans/TASKS.json` - marked US-011 as passes: true

### Learnings for future iterations
- Both `review.py` and `loop.py` implement reviewer iteration independently; tests should cover both paths
- The `_run_review_loop()` function in loop.py can be tested directly by patching `load_reviewer_configs`, `detect_languages`, and `ReviewLoopService`
- Rich console output can be captured with a StringIO-backed Console for assertions in unit tests

---

[Review] 2026-02-08 06:43 UTC - test-quality (blocking)

### Verdict: PASSED
---

[Review] 2026-02-08 06:44 UTC - code-simplifier (blocking)

### Verdict: PASSED

### Findings

1. **CS-001**: Redundant Code - Duplicate guard clauses in should_run_reviewer
   - File: src/ralph/services/review_loop.py:78
   - Issue: Two consecutive checks `if reviewer.languages is None: return True` followed by `if not reviewer.languages: return True` are redundant. The second check (`not reviewer.languages`) already handles both `None` and empty list `[]` since both are falsy in Python.
   - Suggestion: Consolidate into a single `if not reviewer.languages: return True` check. This preserves the exact same behavior while removing the redundant branch.

2. **CS-002**: Redundant Code - Duplicated review loop logic between loop.py and review.py
   - File: src/ralph/commands/loop.py:596
   - Issue: The `_run_review_loop()` function in loop.py (lines 596-725) and the reviewer iteration block in `review()` in review.py (lines 202-323) contain ~130 lines of nearly identical code: language filter checks, resume state loading, no-fix handling, result collection, summary display, and state file cleanup. This duplication was noted in PROGRESS.txt entries for US-003 and US-005 as a known issue.
   - Suggestion: Extract the shared review execution and summary display logic into a shared function or method (e.g., on ReviewLoopService or a dedicated helper module). Both commands would call this shared implementation, reducing the maintenance burden when adding new review features. This is a significant refactoring opportunity for a future iteration.

3. **CS-003**: Redundant Code - Duplicated test helper functions across test files
   - File: tests/test_nofix_and_resume.py:31
   - Issue: The `working_directory` context manager is duplicated in 4 test files (test_review_force.py, test_nofix_and_resume.py, test_review_command.py, test_commands_integration.py). Similarly, `_make_reviewers`, `_make_mock_service`, and `_create_reviewer` helpers are duplicated across 2-3 test files each.
   - Suggestion: Move shared test helpers to `tests/conftest.py` as fixtures or to a `tests/helpers.py` utility module. This would eliminate ~60 lines of duplication and ensure consistency when helpers need updating.

---

[Review] 2026-02-08 06:44 UTC - code-simplifier (blocking)

### Verdict: PASSED
---

[Review] 2026-02-08 06:46 UTC - python-code (blocking)

### Verdict: PASSED

### Findings

(No issues found)

---

[Review] 2026-02-08 06:46 UTC - python-code (blocking)

### Verdict: PASSED
---

[Review] 2026-02-08 06:50 UTC - github-actions (warning)

### Verdict: PASSED

### Workflow Coverage

| Workflow Type | Present | File |
|---------------|---------|------|
| Lint/Format | ✅ | ci.yml |
| Unit Tests | ✅ | ci.yml |
| Build Check | ✅ | ci.yml (via uv sync) |
| Integration Tests | N/A | - |
| Dependabot | ✅ | dependabot.yml |
| Security Scan | ✅ | security.yml |

### Findings

(No issues found)

---

[Review] 2026-02-08 06:47 UTC - github-actions (warning)

### Verdict: PASSED
---

[Review] 2026-02-08 17:50 UTC - repo-structure (warning)

### Verdict: PASSED

### Findings

(No issues found)

---

[Review] 2026-02-08 06:47 UTC - repo-structure (warning)

### Verdict: PASSED
---

[Review] 2026-02-08 17:55 UTC - release (blocking)

### Verdict: NEEDS_WORK

### Findings

1. **RL-001**: Dirty Working Directory - Uncommitted changes present
   - File: plans/PROGRESS.txt:0
   - Issue: Working directory has uncommitted changes in 5 files: plans/PROGRESS.txt (staged), plans/SPEC.md, tests/test_nofix_and_resume.py, tests/test_review_state.py, uv.lock (unstaged modified), and plans/PROGRESS.20260208_055328.txt (untracked). All changes must be committed before release.
   - Suggestion: Stage and commit all pending changes, or discard any that are not intended for the release.

2. **RL-002**: Archive Cleanup - PROGRESS.txt archive file found
   - File: plans/PROGRESS.20260208_055328.txt:0
   - Issue: Found archived PROGRESS file that should be deleted before release.
   - Suggestion: Delete plans/PROGRESS.20260208_055328.txt or move it out of the plans/ directory.

---

[Review] 2026-02-08 06:49 UTC - release (blocking)

### Verdict: PASSED
---

## 2026-02-08 - US-012

**Story:** Add unit tests for already-implemented story detection

### What was implemented
- Created `tests/test_already_implemented.py` with 19 tests across 4 test classes
- Tests organised by acceptance criteria: codebase context in prompt, counting/logging, message format, normal operation
- Both unit-level (`_log_already_implemented`, `_build_prompt_from_skill`) and end-to-end (`tasks` CLI command via CliRunner) coverage

### Tests written
- **TestCodebaseSummaryInPrompt** (5 tests): prompt heading, detection instructions, file tree, key file contents, end-to-end via CLI
- **TestAlreadyImplementedCounting** (6 tests): 2-of-3 counting, single story, all stories, story ID/title logging, notes retention, end-to-end via CLI
- **TestSummaryMessageFormat** (4 tests): [Tasks] prefix, full format string, ID:title format, end-to-end via CLI
- **TestNoStoriesDetected** (4 tests): no output with all passes=False, empty story list, CLI omits message, CLI completes normally

### Files changed
- `tests/test_already_implemented.py` - new file, 19 tests
- `plans/TASKS.json` - marked US-012 as passes: true
- `plans/PROGRESS.txt` - appended this entry

### Learnings for future iterations
- Shared test helpers (`working_directory`, `_make_story`, `_make_tasks_file`) are duplicated across test files; a future cleanup could extract them to `tests/conftest.py`
- Rich console output renders `\[Tasks]` as `[Tasks]` in the terminal; `capsys` and `CliRunner.output` both capture the rendered text (no Rich markup)

---

[Ralph CLI] 2026-02-08 06:55 UTC - US-012 (Add unit tests for already-implemented story detection) completed successfully - All stories complete!

## 2026-02-08 - US-001

**Story:** Add package metadata to pyproject.toml

### What was implemented
- Added PyPI classifiers for Python 3.11, 3.12, 3.13, MIT license, and Development Tools topics
- Added `[project.urls]` section with Homepage, Repository, Changelog, and Issue Tracker links pointing to the GitHub repo
- Verified `uv build` produces valid sdist (`ralph_cli-2.1.0.tar.gz`) and wheel (`ralph_cli-2.1.0-py3-none-any.whl`)
- Confirmed metadata is embedded correctly in wheel METADATA file

### Tests written
- No new tests needed (metadata-only change to pyproject.toml; verified via uv build + METADATA inspection)
- All 312 existing tests pass

### Files changed
- `pyproject.toml` - added `classifiers` list and `[project.urls]` section
- `plans/TASKS.json` - marked US-001 as passes: true

### Learnings for future iterations
- PyPI classifiers must use exact strings from the PyPI classifier list (e.g., "Topic :: Software Development :: Build Tools")
- `[project.urls]` keys with spaces need quoting in TOML (e.g., `"Issue Tracker"`)
- `unzip -p` on wheel files can inspect dist-info/METADATA to verify embedded metadata without installing

---

[Ralph Loop] 2026-02-08 07:26 UTC - Iteration 1: US-001 (Add package metadata to pyproject.toml) completed

## 2026-02-08 - US-002

**Story:** Add version consistency check to CI

### What was implemented
- Added `version-check` job to `.github/workflows/ci.yml` that runs on push to main and PRs targeting main
- Job extracts version from `pyproject.toml` using a python3 one-liner with regex
- Job extracts `__version__` from `src/ralph/__init__.py` using a python3 one-liner with regex
- Job checks `CHANGELOG.md` for a `## [X.Y.Z]` header matching the pyproject.toml version using `grep -qF`
- Final verification step compares all three and fails with `::error` annotations identifying exactly which files mismatch
- Job runs without installing project dependencies (only python3 stdlib and grep)

### Tests written
- No new tests needed (CI workflow YAML change only; verified by lint/format checks passing)
- All 312 existing tests pass

### Files changed
- `.github/workflows/ci.yml` - added `version-check` job
- `plans/TASKS.json` - marked US-002 as passes: true

### Learnings for future iterations
- GitHub Actions `::error file=<path>::<message>` annotations surface directly on PR file diffs
- Python3 is available on `ubuntu-latest` runners without any setup step
- Using `grep -qF` for fixed-string matching avoids regex escaping issues with brackets in changelog headers

---

[Ralph Loop] 2026-02-08 - Iteration 2: US-002 (Add version consistency check to CI) completed

[Ralph Loop] 2026-02-08 07:28 UTC - Iteration 2: US-002 (Add version consistency check to CI) completed

## 2026-02-08 - US-003

**Story:** Add tag-version validation to version check

### What was implemented
- Extended CI push trigger to also fire on tag pushes matching `v*`
- Added "Verify tag matches pyproject.toml version" step to the version-check job
- Step uses `if: startsWith(github.ref, 'refs/tags/v')` so it runs only on tag pushes
- On tag runs, strips the `v` prefix from `GITHUB_REF` and compares to the pyproject.toml version extracted earlier
- On mismatch, fails with a `::error::` annotation clearly showing the tag vs pyproject.toml values
- On non-tag runs (push to main, PRs), the step is skipped entirely (passes silently)

### Tests written
- No new tests needed (CI workflow YAML change only; verified by lint/format/typecheck/test checks passing)
- All 312 existing tests pass

### Files changed
- `.github/workflows/ci.yml` - added `tags: ['v*']` to push trigger, added tag validation step
- `plans/TASKS.json` - marked US-003 as passes: true

### Learnings for future iterations
- GitHub Actions `if: startsWith(github.ref, 'refs/tags/v')` is the clean way to conditionally run steps only on tag pushes
- `${GITHUB_REF#refs/tags/v}` shell parameter expansion strips the prefix to get the bare version number
- The existing `steps.pyproject.outputs.version` output can be reused in subsequent steps without re-extraction

---

[Ralph Loop] 2026-02-08 - Iteration 3: US-003 (Add tag-version validation to version check) completed

[Ralph Loop] 2026-02-08 07:30 UTC - Iteration 3: US-003 (Add tag-version validation to version check) completed

## 2026-02-08 - US-004

**Story:** Create publish workflow with build and TestPyPI jobs

### What was implemented
- Created `.github/workflows/publish.yml` triggered on tag pushes matching `v*`
- Set workflow-level permissions to `id-token: write` (for OIDC) and `contents: write` (for releases)
- Added `build` job that installs uv, runs `uv build`, and uploads `dist/` via `actions/upload-artifact`
- Added `testpypi` job with `needs: build` that downloads artifacts and publishes to TestPyPI using `pypa/gh-action-pypi-publish` with Trusted Publishers (OIDC)
- testpypi job uses a `testpypi` GitHub Actions environment with URL pointing to `https://test.pypi.org/p/ralph-cli`
- testpypi job declares its own `id-token: write` permission for OIDC token generation
- All 5 action versions pinned to full 40-character commit SHAs with version comments

### Tests written
- No new tests needed (GitHub Actions workflow YAML change only; verified by all 4 quality checks passing)
- All 312 existing tests pass

### Files changed
- `.github/workflows/publish.yml` - new file, publish workflow with build and testpypi jobs
- `plans/TASKS.json` - marked US-004 as passes: true

### Learnings for future iterations
- GitHub Actions annotated tags need dereferencing via `git/tags/<sha>` API to get the actual commit SHA for pinning
- The `pypa/gh-action-pypi-publish` action uses `release/v1` as its tag namespace, with specific versions like `v1.13.0`
- Trusted Publishers require both the `id-token: write` permission and a matching GitHub Actions environment name configured in PyPI/TestPyPI

---

[Ralph Loop] 2026-02-08 07:33 UTC - Iteration 4: US-004 (Create publish workflow with build and TestPyPI jobs) completed

## 2026-02-08 - US-005

**Story:** Add production PyPI publish job to workflow

### What was implemented
- Added `pypi` job to `.github/workflows/publish.yml` that publishes to production PyPI
- Job uses `pypa/gh-action-pypi-publish` with Trusted Publishers (OIDC) via `id-token: write` permission
- Job depends on `testpypi` job completing successfully (`needs: testpypi`)
- Job uses a separate `pypi` GitHub Actions environment with URL `https://pypi.org/p/ralph-cli` for deployment protection
- Action version pinned to same full commit SHA as testpypi job (`ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e`)
- No `repository-url` override needed since production PyPI is the default target

### Tests written
- No new tests needed (GitHub Actions workflow YAML change only; verified by all 4 quality checks passing)
- All 312 existing tests pass

### Files changed
- `.github/workflows/publish.yml` - added `pypi` job
- `plans/TASKS.json` - marked US-005 as passes: true

### Learnings for future iterations
- Production PyPI is the default target for `pypa/gh-action-pypi-publish` — no `repository-url` is needed (unlike TestPyPI which requires the override)
- Each publish environment (testpypi, pypi) should have its own `id-token: write` permission block for OIDC
- Using separate GitHub Actions environments enables deployment protection rules (e.g., manual approval) per target

---

[Ralph Loop] 2026-02-08 07:35 UTC - Iteration 5: US-005 (Add production PyPI publish job to workflow) completed

## 2026-02-08 - US-006

**Story:** Add GitHub Release creation job to workflow

### What was implemented
- Added `release` job to `.github/workflows/publish.yml` with `needs: pypi` dependency
- Job checks out the repo and extracts the version number from the git tag (strips `v` prefix)
- Uses `awk` to extract release notes from CHANGELOG.md between the current version header and the next `## [` header
- Downloads the `python-package-distributions` artifact to `dist/`
- Determines whether the release is stable or pre-release by checking for pre-release suffixes (a, b, rc, dev, alpha, beta, pre)
- Creates a GitHub Release using `gh release create` with the tag name as title, extracted notes as body, and `dist/*` as attached assets
- Stable versions get `--latest` flag, pre-release versions get `--prerelease` flag
- Uses the existing `contents: write` permission and `github.token` for authentication
- Action versions reuse the same pinned commit SHAs as other jobs (checkout, download-artifact)

### Tests written
- No new tests needed (GitHub Actions workflow YAML change only; verified by all 4 quality checks passing)
- All 312 existing tests pass

### Files changed
- `.github/workflows/publish.yml` - added `release` job
- `plans/TASKS.json` - marked US-006 as passes: true

### Learnings for future iterations
- `gh release create` with `--notes-file` is simpler than `softprops/action-gh-release` and avoids pinning another third-party action SHA
- The `--latest` and `--prerelease` flags on `gh release create` control the "Latest" badge on the GitHub Releases page
- `awk` with found flag pattern is a clean way to extract sections from markdown between headers
- `GH_TOKEN` environment variable is the preferred way to authenticate `gh` CLI in Actions (vs `--token` flag)

---

[Ralph Loop] 2026-02-08 - Iteration 6: US-006 (Add GitHub Release creation job to workflow) completed

[Ralph Loop] 2026-02-08 07:37 UTC - Iteration 6: US-006 (Add GitHub Release creation job to workflow) completed

## 2026-02-08 - US-007

**Story:** Gate publish workflow on CI checks passing

### What was implemented
- Added `ci-check` job to `.github/workflows/publish.yml` as the first job in the pipeline
- Job uses `gh api` to query the CI workflow (`ci.yml`) runs for the exact commit SHA (`GITHUB_SHA`)
- Polls up to 30 attempts with 30-second intervals (15-minute total timeout) for CI to complete
- Three clear failure modes with `::error::` annotations:
  - CI workflow has not run for the commit
  - CI workflow completed with a non-success conclusion
  - CI workflow still running after timeout
- On success, prints the commit SHA and exits cleanly
- Added `needs: ci-check` to the `build` job so the entire publish pipeline gates on CI passing

### Tests written
- No new tests needed (GitHub Actions workflow YAML change only; verified by all 4 quality checks passing)
- All 312 existing tests pass

### Files changed
- `.github/workflows/publish.yml` - added `ci-check` job, added `needs: ci-check` to `build` job
- `plans/TASKS.json` - marked US-007 as passes: true

### Learnings for future iterations
- `gh api` with `--jq` can directly extract specific JSON fields from the GitHub API response
- `head_sha` query parameter on the workflow runs endpoint filters runs by the exact commit SHA
- Using `jq -r` on the raw JSON response (separate from `--jq`) provides flexibility for multiple field extraction from the same response
- The `github.token` in GitHub Actions has read access to workflow runs by default (no extra permissions needed)

---

[Ralph Loop] 2026-02-08 07:39 UTC - Iteration 7: US-007 (Gate publish workflow on CI checks passing) completed

## 2026-02-08 - US-008

**Story:** Create branch protection setup script

### What was implemented
- Created `scripts/setup-branch-protection.sh` that configures branch protection for main via `gh api`
- Script accepts optional `OWNER/REPO` argument, defaults to current repo via `gh repo view`
- Uses `gh api --method PUT repos/{owner}/{repo}/branches/main/protection` with JSON payload
- Requires at least 1 approving PR review (`required_approving_review_count: 1`)
- Requires all 8 status checks: 6 CI test matrix combinations, `version-check`, and `CodeQL Analysis`
- Requires branches to be up to date before merging (`strict: true`)
- Prevents force pushes (`allow_force_pushes: false`)
- Enforces rules for administrators (`enforce_admins: true`)
- Script is executable (`chmod +x`) with comprehensive header comments (usage, args, prerequisites, example)

### Tests written
- No new tests needed (shell script only; verified by all 4 quality checks passing)
- All 312 existing tests pass

### Files changed
- `scripts/setup-branch-protection.sh` - new file, branch protection setup script
- `plans/TASKS.json` - marked US-008 as passes: true

### Learnings for future iterations
- GitHub branch protection API uses PUT on `repos/{owner}/{repo}/branches/{branch}/protection`
- `required_status_checks.strict: true` enforces "require branches to be up to date before merging"
- Status check context names for matrix jobs follow the pattern `job-name (matrix-value-1, matrix-value-2)`
- `restrictions: null` is required in the payload (cannot be omitted) to not restrict pushes to specific users/teams

---

[Ralph Loop] 2026-02-08 18:40 UTC - Iteration 8: US-008 (Create branch protection setup script) completed

[Ralph Loop] 2026-02-08 07:41 UTC - Iteration 8: US-008 (Create branch protection setup script) completed

## 2026-02-08 - US-009

**Story:** Document Trusted Publisher setup steps

### What was implemented
- Created `PUBLISHING.md` with complete Trusted Publisher setup documentation
- Documented PyPI configuration: project name (`ralph-cli`), owner (`jackemcpherson`), repo (`ralph-cli`), workflow (`publish.yml`), environment (`pypi`)
- Documented TestPyPI configuration separately with its own environment name (`testpypi`)
- Included a quick-start checklist covering project creation, Trusted Publisher setup, environment creation, and first tag push
- Added sections for how the pipeline works, GitHub Actions environment setup, triggering a release, and troubleshooting common issues
- Added references to PyPI Trusted Publishers docs, pypa/gh-action-pypi-publish, and GitHub OIDC docs

### Tests written
- No new tests needed (documentation-only change; verified by all 4 quality checks passing)
- All 312 existing tests pass

### Files changed
- `PUBLISHING.md` - new file, Trusted Publisher setup guide
- `plans/TASKS.json` - marked US-009 as passes: true

### Learnings for future iterations
- Trusted Publisher configuration requires exact match on all 5 fields (project name, owner, repo, workflow name, environment name)
- PyPI and TestPyPI are independent registries; each needs its own Trusted Publisher entry
- The environment name in the workflow YAML must match the environment name configured in the Trusted Publisher on PyPI/TestPyPI

---
