Modularity Review

Scope: ccgram (src/ccgram/) — post-Round-4 design audit. Branch modularity-decouple-round-4. ~42 KLOC, 14 handler subpackages, 5 agent providers, 2 LLM providers, optional Mini App.
Date: 2026-05-01

ccgram is a Telegram bot that drives AI coding agents (Claude Code, Codex, Gemini, Pi, plain shell) inside tmux windows; each Forum topic binds to one window and one agent session. Round 4 (the 6-phase refactor that just landed on this branch) addressed every finding from the 2026-04-29 review (F1–F6: handler subpackages, constructor DI for stores, bot.py split, window_tick decide/observe/apply, TelegramClient Protocol, lazy-import audit). The result is good bones, narrower context per task, but four residual hot spots that account for most of the remaining cross-module churn: a stateful "polling strategies" hub that doubles as the canonical home for pure decision types, two oversized handler files (command_orchestration.py 775 LOC, recovery_callbacks.py 890 LOC), the surviving module-level singleton pattern (now wrapped in proxies but still global), and a residual lazy-import gap (~89 in-function imports without the Round-4 documentation comment).

What Round 4 Already Bought You

These are no longer issues; do not re-litigate them.

Domain Classification

Largely unchanged from the prior review; the deltas worth noting are that messaging and Mini App have settled into supporting/core respectively, and that polling has emerged as a core subdomain in its own right (every UX iteration touches it).

Subsystem Type Volatility Comment
Telegram UX (handlers/)CoreHighWhere competitive value lives. Topic emoji, status bubble, recovery, voice still moving.
Provider abstraction (providers/)CoreHigh5 providers in 6 months; capability matrix continues to grow.
Polling / status detectionCoreHighHook + scrape + lifecycle interleaved; touched by every recent UX overhaul.
Inter-agent messaging (mailbox/)CoreMediumStabilising; new spawn flow + idle detection.
Mini App (miniapp/)CoreHighNew in v3.0; HTTP/WS surfaces still expanding.
Session monitoringSupportingMediumHooks vs hookless variants; events.jsonl surface stable.
State persistenceSupportingLowstate.json schema is stable; forward-compat by design.
tmux integration (tmux_manager.py)GenericLowtmux + libtmux APIs are static.
Telegram client (PTB)GenericLowAPI stable; F5 reduced exposure to outbound surface only.
LLM / WhisperGenericLow (functional)OpenAI-compatible HTTP shape; pluggable via Protocol.

Coupling Overview

The integrations below are the hot edges in the post-Round-4 graph — pairs where a change in one component is likeliest to ripple. Strength levels in the cells are linked to the relevant chapter.

Integration Strength Distance Volatility Balanced?
polling_strategies.py 5 module-level singletons → 12+ call sites Model sibling (mostly via lazy import) medium ❌ singleton hub
decide.py (pure) → polling_strategies.py for TickContext/STARTUP_TIMEOUT/is_shell_prompt Model sibling low ⚠ pure kernel still drags 1 073-line stateful module
command_orchestration.py — 4 jobs in one 775-LOC file (forward + menu sync + failure probe + status snapshot) Functional same module medium ❌ low cohesion
recovery_callbacks.py — dispatcher + 6 handlers + 3 keyboard builders + 2 scan helpers (890 LOC) Functional same module medium ❌ low cohesion
handlers/* → session_manager (44 sites) vs window_query/session_query (30 sites) Functional (mixed) sibling medium ⚠ two access patterns coexist
handlers/* → window_store/thread_router/user_preferences (proxy globals) Model sibling medium ⚠ proxy hides "wired vs unwired" failure mode
Tests → bootstrap.reset_for_testing() + 3 separate _reset_*_for_testing hooks Intrusive external (test → src) medium ❌ singleton-reset ceremony
Lazy-import sites (~89 undocumented out of 222 total) Intrusive (latent) sibling medium ❌ hidden cycles
bot.py __all__ re-exports of 19 moved symbols (test-patch targets) Model same module medium ⚠ load-bearing for legacy unittest.mock.patch
Handlers → telegram.{Update, InlineKeyboardButton, InlineKeyboardMarkup, CallbackQuery, MessageEntity} (inbound + UI) Model 3rd-party low ✅ tolerable per balance rule
TelegramClient Protocol → 18 outbound bot API methods Contract sub-package low ✅ exemplary
miniapp/ → rest of ccgram (only via its own auth module) Contract sub-package high ✅ best-isolated subsystem in the codebase
AgentProvider Protocol → 5 implementations + capability flags Contract sub-package high ✅ unchanged from prior review; remains the gold standard
bootstrap.py post_init → 7 wiring steps with ordering invariant Intrusive (necessary) sibling high ✅ now named + asserted (_callbacks_wired flag)

Issues

The four critical ones below are unbalanced and sit in volatile parts of the codebase. The two significant ones are unbalanced in medium-volatility areas. The minors are tolerable per the balance rule — listed for completeness, not for immediate work.

Issue: polling_strategies.py is a hidden singleton hub

Integration: handlers/polling/polling_strategies.py → 12+ lazy importers across recovery/, shell/, topics/, status/, live/, text/, window_tick/decide.py
Severity: Critical

Knowledge Leakage

polling_strategies.py is 1 073 lines and contains two unrelated kinds of thing in one module: (a) the pure data types TickContext, TickDecision, PaneTransition, the constant STARTUP_TIMEOUT, and the pure function is_shell_prompt — i.e. the contract that the F4 decision kernel was meant to be pure with respect to — and (b) five stateful module-level singletons: terminal_poll_state, terminal_screen_buffer, interactive_strategy, lifecycle_strategy, pane_status_strategy.

The F4 win in window_tick/decide.py ("zero deps on tmux/PTB/singletons") is accurate at the decision-function level — decide_tick(ctx) -> TickDecision is a deterministic mapping. But the import graph tells a different story: decide.py does from ..polling_strategies import STARTUP_TIMEOUT, TickContext, TickDecision, is_shell_prompt, which executes polling_strategies.py top-level at import time — instantiating all five singletons (pyte screen, RC debounce dict, pane-count cache, autoclose-timer dict, dead-notification set). The pure kernel is only pure once it's loaded; loading it is not.

This is model coupling where it should be contract coupling. Twelve other call sites compound the problem: most of them use from ..polling.polling_strategies import lifecycle_strategy (or terminal_screen_buffer, or interactive_strategy) inside function bodies as the documented workaround for the cycle this would otherwise create. That's eight # Lazy: comments admitting the same architectural fact.

Complexity Impact

Any change inside polling_strategies.py — adding a debounce field, renaming a method on TerminalScreenBuffer, tweaking STARTUP_TIMEOUT — has potential ripple into ≥12 sibling modules, most of which discovered the dependency late at import time. Cognitive load on a developer reading decide.py: small ✓. Cognitive load on a developer reading polling_strategies.py: 8 classes, 5 instances, 2 distinct-concept clusters, 1 073 lines — well past the 4 ± 1 working-memory budget.

The "lazy import everywhere" pattern means the static module graph lies: tooling sees decide.py → polling_strategies only, while the runtime graph fans out through every consumer of the singletons. That's the "hidden cycles" half of the problem the F6 audit was supposed to surface; it's still there, just better-commented.

Cascading Changes

Concrete recent triggers (from git log of this file):

Recommended Improvement

Split the file into two siblings inside handlers/polling/:

Then decide.py imports only from polling_types.py, eliminating the load-side effect for the pure kernel and making the F4 purity claim true at the import level too. The 12 lazy-import sites for the singletons stay lazy if they still need to (or, better, pass the strategy in via apply.py-style DI), but they no longer share a file with the contract.

Trade-off: ~1 day of mechanical work, ~30 import sites updated, no behaviour change. Risk is low; cycle-detection test will catch regressions. The payoff is that the F4 architecture finally pays its full dividend — decide.py becomes truly leaf-level and unit-testable without instantiating pyte buffers.

Issue: command_orchestration.py packs four unrelated jobs into 775 LOC

Integration: internal cohesion of handlers/command_orchestration.py
Severity: Critical

Knowledge Leakage

The module name suggests one job (orchestrating slash-command forwarding); the file does four:

  1. Forward unknown /commands to the agent session (forward_command_handler, ~110 LOC). The actual orchestration.
  2. Per-user / per-chat / global menu sync (sync_scoped_provider_menu, sync_scoped_menu_for_text_context, get_global_provider_menu, set_global_provider_menu, setup_menu_refresh_job). State management for BotCommandScope*.
  3. Post-send command-failure probing (_capture_command_probe_context, _probe_transcript_command_error, _maybe_send_command_failure_message, _spawn_command_failure_probe, _extract_pane_delta, _extract_probe_error_line, _command_known_in_other_provider). A separate concern — verifying that the agent actually accepted the command — that runs after forwarding.
  4. Status / stats snapshot fallback (_status_snapshot_probe_offset, _maybe_send_status_snapshot). A different feature entirely (showing /status output even when the agent doesn't understand the command).

These four jobs share file scope but not domain knowledge: changing menu-scope cache keys does not affect failure-probe extraction logic, and vice versa. This is low cohesion in the classic-coupling sense — coincidental grouping by surface ("things that involve /commands") rather than by reason-to-change.

Complexity Impact

Cascading Changes

Recommended Improvement

Split into a sibling subpackage handlers/commands/:

This is the same shape as the existing shell/ subpackage (shell_commands + shell_capture + shell_context + shell_prompt_orchestrator), where the cohesion-by-feature pattern already works.

Trade-off: ~half a day of mechanical work; one git-history-aware reviewer to make sure each split file's docstring is accurate. Behaviour unchanged. Payoff: the menu-sync logic and failure-probe logic become independently reviewable, and the next provider that demands a different command surface only touches forward.py.

Issue: recovery_callbacks.py is the largest handler at 890 LOC and conflates two unrelated UX flows

Integration: internal cohesion of handlers/recovery/recovery_callbacks.py
Severity: Critical

Knowledge Leakage

The file pairs two distinct UX surfaces:

  1. Dead-window recovery bannerRecoveryBanner dataclass, render_banner, _recovery_help_text, build_recovery_keyboard, _handle_back, _handle_fresh, _handle_continue, _handle_resume, _handle_browse, _handle_cancel. Triggered when a window dies and the topic is orphaned.
  2. Resume picker_SessionEntry, scan_sessions_for_cwd, _scan_index_for_cwd, _scan_bare_jsonl_for_cwd, _build_resume_picker_keyboard, _build_empty_resume_keyboard, _send_empty_state, _handle_resume_pick. Triggered from /resume or from the recovery banner's "Resume" button.

The two share a callback-data namespace and a _validate_recovery_state helper, but they have different state machines, different keyboard layouts, and different test surfaces. The shared dispatcher (_dispatch, handle_recovery_callback) is what justifies co-location, and it's ~50 LOC. The remaining ~840 LOC are the two flows.

This is low-cohesion / model coupling — the file groups by callback-prefix (rc:) rather than by reason-to-change. A change to how the resume picker paginates does not affect the banner; a change to what the banner offers when no transcript exists does not affect the picker.

Complexity Impact

Cascading Changes

Recommended Improvement

Split into two siblings inside handlers/recovery/:

The recovery/ __init__.py re-exports stay the same, so external call sites are unaffected.

Trade-off: ~half a day. The picker scan helpers (scan_sessions_for_cwd, _scan_index_for_cwd, _scan_bare_jsonl_for_cwd) are imported by resume_command.py already — moving them into resume_picker.py keeps the import path inside the subpackage. Behaviour unchanged. Payoff: the resume-picker pagination work that's already on the backlog (per CLAUDE.md UX overhaul history) lands in one ~400-LOC file instead of touching the largest handler in the codebase.

Issue: Module-level singletons survived as proxies, with a 3-step test-reset ceremony

Integration: WindowStateStore / ThreadRouter / UserPreferences / SessionMapSync (proxies) ↔ bootstrap.reset_for_testing() + 3 separate _reset_*_for_testing hooks
Severity: Significant

Knowledge Leakage

F2 successfully replaced _wire_singletons() monkey-patching with constructor injection inside SessionManager, and it added the loud-failing register_*_callback helpers. But the module-level globals window_store, thread_router, user_preferences, and session_map_sync did not go away — they became proxy objects (_WindowStoreProxy, _ThreadRouterProxy, etc.) that forward attribute access to the wired instance:

_active_store: WindowStateStore | None = None
window_store: WindowStateStore = cast("WindowStateStore", _WindowStoreProxy())

Every handler that previously did from .session import session_manager and then accessed session_manager.window_states continues to work because the proxy intercepts attribute access and re-routes to _active_store. This is better than monkey-patching — the failure mode is now an explicit RuntimeError("not wired") from the proxy — but it still expresses dependency injection as lookup of a global variable, not as parameter passing.

Concretely:

Complexity Impact

The proxy preserves call sites at the cost of:

  1. Lookup-time failure: a handler that touches window_store before SessionManager is constructed gets RuntimeError("not wired") at the call site, not at import time. Tests that import handlers and then forget to install a WindowStateStore get a misleading stack trace pointing at the handler, not at the test setup.
  2. Test parallelism is brittle: any test that mutates window_store mutates a process-wide global. pytest-xdist would require per-worker isolation that doesn't exist.
  3. Two overlapping APIs: a handler that needs to read window state can use either the window_query free functions or session_manager.window_states[...] directly. The query layer was meant to be the only read path; ~44 sites disagree.

This is the tight-coupling smell at the singleton boundary — strength is high (handlers know the singleton's name and method shape), distance is low (sibling), and volatility is medium (every UX feature touches at least one of these stores).

Cascading Changes

Recommended Improvement

Two-step migration, in order:

Step 1 — enforce one read path. Adopt a project rule: handlers read window/session state through window_query and session_query only. Migrate the 44 direct-session_manager.* call sites in handlers to the query layer. Leave bootstrap-time and admin paths (e.g. sync_command.py, sessions_dashboard.py) on the direct API. ~1 day.

Step 2 — pass the wired SessionManager into handlers that need write access. The handlers that genuinely need to mutate stores (topic creation, dead-window cleanup, provider switch) should receive the manager as a parameter — either through context.bot_data["session_manager"] (PTB-native) or via a small request-scoped context object. Once writes go through a passed dependency, the proxies become read-only re-exports, and reset_for_testing collapses to one call (or disappears entirely if tests use a fresh SessionManager() per test). ~3–5 days.

Trade-off: Step 1 is pure mechanical refactor with measurable benefit (one access pattern, fewer importers). Step 2 is the bigger commitment but is what actually retires the singleton-as-proxy pattern. Skip Step 2 if the test friction is tolerable; do Step 1 either way.

Issue: Lazy-import gap — 222 in-function imports vs 124 documented # Lazy: comments

Integration: latent cycles in src/ccgram/
Severity: Significant

Knowledge Leakage

Round 4's F6 phase claimed "160 intentional lazy imports documented" out of 251 originally inventoried. Reality, post-Round-4, on this branch:

That leaves ~89 in-function imports without an explicit reason comment. Some are inside test-reset hooks (legitimate; test-only) and inside functions that take a provider_name parameter (legitimate; they import per-provider format modules). But the F6 invariant — "every remaining lazy import is documented with # Lazy: <reason> citing the cycle path or wiring contract" — does not hold across the whole tree.

This is intrusive coupling of the latent kind: a deferred import is a runtime-only dependency edge that the static module graph misses. Each undocumented one is a place where a future refactor can change behaviour silently — the graph won't break, but cold-import timing or test-order dependence might.

Complexity Impact

  1. The test_import_no_cycles.py integration test parametrizes 29 modules. The remaining 90+ modules with lazy imports are not covered. A new cycle introduced through a deferred import will only show up if a test happens to be the first caller.
  2. Reviewers cannot tell, at a glance, whether a deferred import is intentional (cycle break) or accidental (someone forgot to hoist). The F6 comments were a contract; an undocumented deferred import silently breaks that contract.

Cascading Changes

Recommended Improvement

One mechanical pass:

  1. Add a CI check (make lint or a pre-commit hook) that greps for ^[[:space:]]+from \. and asserts each match is preceded by # Lazy:, if TYPE_CHECKING, or appears inside a _reset_*_for_testing function. The first prototype is ~40 lines of Python.
  2. Walk the 89 currently-undocumented sites: hoist the ones that don't actually break a cycle, and add # Lazy: to the rest with the cycle path or contract reason.
  3. Expand test_import_no_cycles.py to cover the full set of 50 top-level modules + the 14 handler subpackages. Currently 29; the other ~35 are equally likely to host a regression.

Trade-off: 2–3 hours for the lint check; ~1 day for the audit + test expansion. Behaviour unchanged. Payoff: the F6 contract becomes self-enforcing, and the cycle test becomes a real safety net rather than a sample.

Issue: bot.py __all__ re-exports 19 moved symbols for unittest.mock.patch compatibility

Integration: legacy tests → ccgram.bot
Severity: Minor

Knowledge Leakage

bot.py declares an __all__ list of 19 symbols (handler callables and singletons) that have all been moved to feature subpackages. The accompanying comment is candid: "Re-export the moved handler callables and supporting singletons so existing tests and integration suites that import them from ccgram.bot keep working without churn. Canonical homes are the feature subpackages — these names are retained for patch targets."

This is a compatibility shim for unittest.mock.patch("ccgram.bot.session_manager", ...)-style test code. It's working as designed and the prior review didn't flag it. But it's model coupling on the test side: the tests know that session_manager lives at ccgram.bot.session_manager — which it doesn't anymore.

Complexity Impact

Low. The shim is documented; new code uses canonical paths; the symbols listed are stable.

Cascading Changes

If session_manager ever moves out of ccgram.session (Step 2 of the singleton-retirement work above), the bot.py re-export silently keeps working, masking what should be a noisy test-update prompt.

Recommended Improvement

Trade-off: zero risk; defer until the tests get touched for other reasons.

Issue: Inbound PTB types still pervade handler signatures

Integration: 38+ handler modules → telegram.{Update, InlineKeyboardButton, InlineKeyboardMarkup, CallbackQuery, MessageEntity} and telegram.ext.ContextTypes
Severity: Minor

Knowledge Leakage

F5 isolated the outbound bot API behind a 18-method TelegramClient Protocol — that work is done and excellent. But the inbound surface is essentially unchanged: PTB calls handlers with Update and ContextTypes.DEFAULT_TYPE as positional arguments, so every handler signature still spells those types. Inline-keyboard construction also uses InlineKeyboardButton / InlineKeyboardMarkup directly — there's no facade.

Per the balance rule, this is tolerable: PTB's inbound API is a stable, low-volatility 3rd-party contract, so high-strength coupling against a low-distance-but-low-volatility external is balanced. The previous review acknowledged this trade-off explicitly.

Complexity Impact

Cognitive load for new contributors: medium. A new contributor reading any handler still has to know what Update, CallbackQuery, and ContextTypes look like. Because the outbound API is now hidden, the asymmetry is jarring: the handler reads Update.callback_query.data (PTB) and then calls client.edit_message_text(...) (the Protocol). Half the file uses one mental model, half uses another.

Cascading Changes

Effectively zero. PTB has not had a breaking change to the inbound types in the lifetime of this project.

Recommended Improvement

Do not invest. The cost is high (30+ files, custom IncomingUpdate / Keyboard types, factory functions) and the volatility is genuinely low. If the project ever migrates to a different Telegram framework, this becomes a real cost — but that's a hypothetical that doesn't justify pre-paying.

The one cheap win available: a ccgram.handlers.telegram_types re-export module that aliases the PTB types under project-local names (Update as IncomingUpdate, etc.). It costs nothing, makes the framework boundary visible in one file, and gives a single edit point if the situation ever changes. Worth doing during the next routine PTB upgrade.

Scoring (0–10)

The previous review scored 6.3 weighted; this review scores 7.1. The deltas are concentrated where Round 4 invested: subsystem locality, lifecycle clarity, abstraction quality, testability of pure logic. The two new entries (#21, #22) reflect what Round 4 didn't reach.

#Design POV04-2905-01Comment
1Module cohesion (single-responsibility)67Subpackages cohesive; command_orchestration, recovery_callbacks, polling_strategies outliers
2Coupling — overall (Balanced Coupling)67F2/F5 reduced strength; singleton hub remains
3Separation of concerns (UI vs domain)57decide.py pure; TelegramClient Protocol; UI primitives still inline
4Abstraction quality78+ TelegramClient Protocol; WindowView / decide_tick / CommandResult strong
5Dependency direction (acyclic)67Cycle-detection test added; lazy-import gap of ~89 sites remains
6Testability of pure logic78decide_tick unit-tested without mocks; FakeTelegramClient enables cheap handler tests
7Testability of integration logic57FakeTelegramClient is a real win; reset_for_testing ceremony still hurts
8Boundary discipline (3rd-party isolation)46Outbound PTB hidden behind Protocol; inbound types still leak (acknowledged trade-off)
9Provider extension cost99Unchanged — gold standard
10New Telegram command extension cost68handlers/registry.py CommandSpec table is the one edit point
11Lifecycle clarity68bootstrap.py named steps + ordering invariant + _callbacks_wired guard
12Configuration coupling55config singleton imported by 58 modules; no narrow Settings injection
13State management77window_query / session_query still mixed with direct session_manager.* (44 vs 30)
14Implicit-coupling risk (singletons)45Proxy pattern formalises the global; not eliminated
15Code duplication88Unchanged — _jsonl, expandable_quote, message_task continue to factor
16Subsystem locality (AI-agent context)58Big win — 14 cohesive subpackages, ~5–8× fewer files per task
17Documentation density99+ 124 # Lazy: comments document deferred imports
18Domain model purity67WindowView + decide.py + TelegramClient raise the bar; UI primitives still inline
19Cyclic risk67test_import_no_cycles.py covers 29 modules; needs to cover the rest
20Build / refactor velocity78Round 4 demonstrated capacity to ship a 6-phase refactor + 2 review iterations
21Test-infra debt (NEW)6reset_for_testing + 3 _reset_*_for_testing hooks + 12 test-files import them
22Hidden coupling hubs (NEW)5polling_strategies (5 singletons + pure types in 1 073 LOC); command_orchestration (4 jobs)
Weighted average6.37.1

7.1 reads as "good architecture, four hot spots left." The four critical findings (polling_strategies split, command_orchestration split, recovery_callbacks split, singleton-retirement Step 1) account for almost all the residual friction; together they are 2.5–3 days of mechanical work.

Recommended Order of Work

Pick by leverage. All five preserve behaviour.

  1. polling_strategies.pypolling_types.py + polling_state.py. ~1 day. Eliminates the singleton-hub problem and finally makes the F4 pure-kernel claim true at the import level.
  2. Single read path for window/session state — migrate 44 handler session_manager.* sites to window_query / session_query. ~1 day. Removes "two ways to do it" before any further DI work.
  3. Split recovery_callbacks.py into recovery_banner.py + resume_picker.py. ~half a day.
  4. Split command_orchestration.py into handlers/commands/ subpackage (forward / menu_sync / failure_probe / status_snapshot). ~half a day.
  5. Lazy-import lint check + audit pass. ~half a day for the lint script + 1 day for the audit.
  6. (Optional, multi-day) Singleton-retirement Step 2 — pass SessionManager through context.bot_data. Defer until test-friction warrants it.

What not to do:

Why these moves match the model

These moves are aligned with what the maintainer asked for: smaller focused contexts, faster execution, lower AI-agent cost — without touching the parts (provider abstraction, Mini App boundary, lifecycle wiring, pure decision kernel, TelegramClient Protocol) that already work.