# .rules for MCP Server Development

This file defines **hard** (must-follow) and **soft** (recommended) rules for MCP server development with mcp-refcache. These guidelines maximize code quality, safety, and effective AI assistant collaboration.

---

## Model Context Lengths

Reference for AI model context windows. Monitor token usage and plan handoffs accordingly.

### Anthropic Models

| Model | Hard Limit | Soft Limit (Zed) | Notes |
|-------|------------|------------------|-------|
| Claude Opus 4.5 | 200k | 128k | High capability, expensive |
| Claude Sonnet 4 | 200k | 128k | Balanced capability/cost |
| Claude Sonnet 3.5 | 200k | 128k | Previous gen |
| Claude Haiku 3.5 | 200k | 128k | Fast, cost-effective |

### OpenAI Models

| Model | Hard Limit | Soft Limit (Zed) | Notes |
|-------|------------|------------------|-------|
| GPT-4o | 128k | 64k | OpenAI flagship |
| GPT-4o-mini | 128k | 128k | Cost-effective |
| GPT-4 Turbo | 128k | 128k | OpenAI |
| GPT-4 | 8k / 32k | 128k | Legacy versions |
| o1 | 200k | 128k | Reasoning model |
| o1-mini | 128k | 128k | Reasoning, smaller |

### Other Models

| Model | Hard Limit | Soft Limit (Zed) | Notes |
|-------|------------|------------------|-------|
| Gemini 1.5 Pro | 1M / 2M | 128k | Google |
| Gemini 1.5 Flash | 1M | 128k | Google, fast |
| Llama 3.1 405B | 128k | 128k | Meta, open source |

**Note:** Soft limits in Zed may be lower than hard limits for cost/performance reasons.

### Hard Rules
- **Plan handoffs at 105% of soft limit** - Soft limits have buffer room
- **Track token usage** - System reports current usage
- **Summarize before handoff** - Update scratchpad, provide handoff prompt

### Soft Rules
- Prefer smaller context for simple tasks (cost efficiency)
- Break large tasks into multiple sessions
- Use scratchpads to persist state across sessions

---

## General Rules

### Hard Rules
- **No sensitive data exposure** in logs, errors, or code
- **All changes require review** before merging to main
- **Never execute untrusted code** or arbitrary downloads
- **Cite external sources** when adapting code
- **Document all public APIs** with docstrings and examples
- **Principle of least privilege** for data, access, and dependencies
- **No system-altering commands by agents** (nixos-rebuild, nix build, etc.)
  - Agents may provide commands in code blocks for instruction
  - User must explicitly approve system state changes
- **Secrets never committed** to repository

### Soft Rules
- Prefer declarative, reproducible configurations
- Document major changes in README/CHANGELOG
- Use meaningful commit messages (Conventional Commits)
- Periodically clean `archive/` directories
- Add comments to non-trivial code

---

## File Editing

### Hard Rules
- **Prefer edit over overwrite** - Targeted changes unless complete rewrite needed
- **Preserve existing content** - Add/modify sections, don't replace entire files

### Rationale
Overwriting loses context, makes diffs harder to review, and risks removing important content.

---

## AI Assistant Communication

### Hard Rules
- **Never lie or mislead** - Admit uncertainty explicitly
- **Never appease too quickly** - Challenge ideas, point out flaws
- **Be brutally honest about tradeoffs** - State costs clearly
- **Never hide potential issues** - Report risks immediately
- **Distinguish fact from speculation** clearly

### Soft Rules
- Be conservative rather than enthusiastic
- Question your own suggestions
- Ask clarifying questions before proposing solutions
- State assumptions explicitly
- Warn about complexity upfront
- Acknowledge simpler alternatives

### Rationale
Over-confidence leads to flawed ideas, missed edge cases, unfeasible plans, and loss of trust. Professional engineering requires honest, conservative assessment.

---

## Naming Conventions

### Hard Rules
- **Never use single-letter variable names** - Always descriptive
- **Never abbreviate names** - `user_repository` not `usr_repo`
- **No type encoding** - No Hungarian notation
- **No "I" prefix on interfaces**
- **No "Base" or "Abstract" class names**
- **No "Utils" or "Helper" classes** - Organize into proper modules
- **Include units in names** only when types can't encode them

### Soft Rules
- Prefer longer, descriptive names
- If struggling to name, consider if structure is the problem

---

## Python Rules

### Hard Rules
- **PEP8 formatting** mandatory
- **Type annotations required** for all public functions/methods/classes
- **Pydantic models** for public API data structures
- **Explicit exception handling** - no bare `except:`
- **≥73% code coverage** maintained
- **`__all__` alphabetically sorted** (Ruff RUF022)

### Soft Rules
- Type hints for non-trivial internal variables
- Use Python 3.12+ features where appropriate
- Minimal, explicit imports (no wildcards)
- Prefer `Protocol` over ABC for interfaces

---

## Dependency Management (UV)

### Hard Rules
- **Use `uv` exclusively** - not pip or pip-compile
- **Runtime deps** in `[project.dependencies]`
- **Dev deps** in `[dependency-groups.dev]`
- **Use `uv add/remove`** - never manually edit pyproject.toml
- **Commit pyproject.toml + uv.lock together**

### Soft Rules
- `uv sync` after pulling changes
- `uv lock --upgrade` for updates
- Pin critical production dependencies

### Rationale
UV is 10-100x faster, reproducible, follows PEP 621, single source of truth.

---

## MCP Server Design

### Hard Rules
- **Single entry point** (e.g., `server.py:main`)
- **Environment variable configuration** - no hardcoded secrets
- **Graceful shutdown** (SIGTERM/SIGINT handling)
- **Tool isolation** - independent, stateless where possible
- **Clear tool descriptions** for LLM understanding
- **Structured error returns** agents can understand
- **Reference-based results** for large data (RefCache)

### MCP Tool Return Types
- **`@cache.cached` tools MUST return `dict[str, Any]`**
  - Decorator wraps raw return in cache response
  - Annotation must match wrapped response, not raw data
  - Prevents MCP schema mismatch errors

### Soft Rules
- 12-factor app principles
- Fail fast (validate config at startup)
- Sensible defaults
- Feature flags via environment
- Connection pooling

---

## Documentation

### Hard Rules
- **All public APIs have docstrings** (classes, methods, functions, constants)
- **Include**: summary, parameters, returns, exceptions
- **Examples** in docstrings for non-trivial APIs
- **README accuracy** - shown code must work

### Soft Rules
- Google-style docstrings
- Document edge cases and gotchas
- Include "Getting Started" in README
- Architecture documentation for contributors

---

## Testing

### Hard Rules
- **Test behavior, not implementation**
- **One thing per test** - focused test cases
- **Deterministic tests** - mock time, randomness, I/O
- **Test error paths** - exceptions with correct types/messages

### Soft Rules
- Use pytest fixtures
- Descriptive test names
- Integration tests for component interaction
- Test edge cases (empty, None, boundaries)

---

## Dependency Injection & Design Patterns

### Hard Rules
- **Constructor injection only** - Never property/setter injection or service locators
- **No circular dependencies** - Extract third class or use events
- **Protocol-based contracts** - All dependencies are protocols, not concrete classes

### Soft Rules
- **Factory pattern** for complex construction
- **Explicit over implicit** - Pass dependencies explicitly, avoid global state
- **Test-friendly design** - If hard to test (many mocks), refactor
- **Adapter pattern** for external integrations - Wrap third-party APIs behind protocols
- **Document injection points** - Type hints and docstrings show what/why

---

## AI Assistant Workflow

**THIS WORKFLOW APPLIES TO BOTH GOALS AND TASKS** - Follow all 5 steps for each level of work.

### Hard Rules: Development Process

**Step 1: Gather Context**
- Read relevant files (README, CONTRIBUTING, code)
- Review appropriate scratchpad (`.agent/scratchpad.md`, goal scratchpad, or task scratchpad)
- Research with available tools
- Ask clarifying questions
- MUST NOT jump to implementation without approval

**Step 2: Document Plan**
- Update appropriate scratchpad:
  - **For Goals:** Update goal scratchpad with objectives, architecture, task breakdown
  - **For Tasks:** Update task scratchpad with implementation plan, files to modify, test strategy
  - Document what and why
  - Architectural decisions and tradeoffs
  - Files to modify
  - Success criteria
- Use TODO lists with checkboxes

**Step 3: Pitch Approach**
- Describe high-level approach
- List specific files and changes
- Explain design decisions and alternatives
- Identify risks or breaking changes
- **Wait for user approval before proceeding**

**Step 4: Implement**
- **For Goals:** Create task directories and scratchpads if multi-task
- **For Tasks:** Write tests first (or alongside code), follow all paradigms
- Small, reviewable changes
- Run tests and linters after each change

**Step 5: Document & Review**
- Update docstrings
- Update README if API changed
- Add examples
- **Update scratchpad with completed work** (what was done, not just what will be done)
- Run full test suite

### Workflow Example

**Scenario:** User requests two goals

**Goal 1 (Simple):**
1. Gather context, research
2. Create `goals/01-Goal-Name/scratchpad.md` with plan
3. Pitch approach, wait for approval
4. Implement directly (no tasks needed)
5. Document completion in goal scratchpad

**Goal 2 (Complex):**
1. Gather context, research
2. Create `goals/02-Goal-Name/scratchpad.md` with:
   - Objectives and success criteria
   - Architecture decisions
   - **Proposed task breakdown** (list 3-5 tasks)
3. Pitch approach (including task structure), wait for approval
4. Create task directories: `Task-01-Name/`, `Task-02-Name/`, etc.
5. Document task overview in each task scratchpad

**Task 01 of Goal 2:**
1. Gather context specific to this task
2. Update `Task-01-Name/scratchpad.md` with:
   - Implementation plan
   - Files to create/modify
   - Test strategy
   - Expected outcomes
3. Pitch approach, wait for approval
4. Implement with TDD
5. Document **what was completed** in task scratchpad

**Task 02 of Goal 2:**
1. Repeat Steps 1-5 for Task 02
2. Never skip Step 3 (approval) even for "small" tasks

### Proper Tool Usage
- `read_file` for reading (not cat/head/tail)
- `edit_file` for modifying (not sed/awk)
- `grep` tool for searching (not terminal grep)
- `find_path` for file discovery (not terminal find)
- Terminal ONLY for: tests, linters, builds, package management, git, system checks

### Hard Rules: Code Quality
- **Run `ruff check . --fix && ruff format .`** before committing
- **Tests must pass** with `pytest`
- **Maintain ≥73% coverage**
- **No skipping approval** even for "small" changes

### Anti-Pattern: What NOT to Do
❌ Create task, immediately implement, then document after
✅ Create task, document plan, get approval, implement, document completion

---

## Session Handoff

### Hard Rules
- **Provide prompt in codebox** (triple backticks)
- **Escape any triple backticks inside the codebox** - Use \`\`\` to prevent breaking the outer codebox
- **NOT in scratchpad** - only in chat
- **Keep short** (15-30 lines)
- **Reference scratchpad** for details
- **Update scratchpad first**

### Prompt Structure
```
[Task Title]

Context: Brief state + reference to scratchpad
What Was Done: 3-5 bullet points
Current Task: Clear next steps + specific files
Guidelines: Session-specific constraints (if any - do NOT repeat .rules)
```

**Note:** Guidelines section is ONLY for session-specific overrides or additions
the user provided in chat. Never repeat what's already in .rules file.

---

## Goals & Tasks Structure

### Directory Structure
```
.agent/
├── scratchpad.md           # High-level index
└── goals/
    ├── scratchpad.md       # Goals tracking
    ├── 00-Template-Goal/   # Template
    └── 01-10-*/            # Numbered goals
        ├── scratchpad.md
        └── Task-01-*/
            └── scratchpad.md
```

### Status Tracking

**Task/Goal Status Definitions:**
- ⚪ **Not Started** - Planned but not begun
- 🟡 **In Progress** - Actively working, implementation ongoing
- 🟠 **Implemented** - Code complete, tests passing, NOT yet user-validated
- 🟢 **Complete** - Implemented AND validated/tested by user in real environment
- 🔴 **Blocked** - Waiting on dependency or decision
- ⚫ **Archived** - Abandoned or deprioritized

**Hard Rules:**
- **NEVER mark Complete (🟢) without user validation**
  - Tests passing ≠ Complete
  - Code working in isolation ≠ Complete
  - Only mark Complete after user confirms functionality in real use
- **Use Implemented (🟠) when code is done but not validated**
  - Tests pass, linting clean, code merged
  - Ready for user testing
  - Awaiting real-world validation
- **Document what validation is needed** in scratchpad before marking Complete

**Rationale:**
Agent-only testing can miss integration issues, UX problems, and real-world edge cases. User validation is the final gate to completion.

### Best Practices
- Keep scratchpads current
- Document decisions with rationale
- Link related tasks
- Use acceptance criteria as checklist

### Scratchpad Hierarchy & Content
- **Main scratchpad** (`.agent/scratchpad.md`) - High-level session state only
  - Current work-in-progress
  - Quick links to active goals/tasks
  - Recent session notes
  - **DO NOT duplicate** detailed plans, decisions, or task breakdowns
- **Goals scratchpad** (`.agent/goals/NN-*/scratchpad.md`) - Goal-level planning
  - Objectives, success criteria, architecture decisions
  - Task breakdown and dependencies
  - **DO NOT duplicate** task-level implementation details
- **Task scratchpad** (`.agent/goals/NN-*/Task-NN-*/scratchpad.md`) - Implementation details
  - Specific code changes, file modifications
  - Test cases, edge cases
  - Implementation notes
- **Reference, don't duplicate** - Link to detailed scratchpads instead of copying content

---

## Project Organization

### Hard Rules
- **Never commit `archive/`** - gitignored, local only
- **Use `archive/` for local files** - experiments, old versions, research

### Soft Rules
- Organize by date or topic
- Document archived decisions (ADRs)
- Clean periodically

---

## Nix Development

### Hard Rules
- **Bash variables in Nix strings**: `''${VAR}` not `${VAR}`
- **No spaces in parameter expansion**: `${VAR:-default}` not `${VAR: -default}`

### Rationale
Nix uses `${...}` for interpolation. Escape bash variables with `''${...}`.

---

## Versioning

### Hard Rules
- **First version is always 0.0.0** - Tests both initial implementation AND release workflow
- **Publish 0.0.0 to PyPI** - Real-world validation of packaging before iterating
- **Second release is 0.0.1** - After fixes from 0.0.0 learnings
- **Never start at 0.1.0 or 1.0.0** - Build confidence through incremental releases
- **Semantic versioning**: MAJOR.MINOR.PATCH
- **Sync pyproject.toml and __version__**

### Version Progression
- `0.0.0` - **First published release** - Tests initial code + packaging workflow
- `0.0.1` - Bug fixes and improvements from 0.0.0 feedback
- `0.0.2`, `0.0.3`, etc. - Continued iterations and refinements
- `0.1.0` - After 5-10 patch releases (0.0.x) and proven stability
- `0.2.0`, `0.3.0`, etc. - New features with backward compatibility
- `1.0.0` - Production-ready, stable API, battle-tested in production

### Soft Rules
- Document all releases in CHANGELOG.md (including 0.0.0)
- Git tag all releases (v0.0.0, v0.0.1, v0.0.2, etc.)
- Expect 0.0.0 to have issues - that's the point
- Wait for at least 5-10 patch releases (0.0.x) before 0.1.0
- Consider 1.0.0 only after 6+ months of stable 0.x usage
- Use pre-release versions for testing new features (0.1.0a1, 0.2.0rc1)

### Rationale
Starting at 0.0.0 accomplishes two critical goals:
1. **Tests implementation** - Validates the initial code works
2. **Tests release workflow** - Forces you to get packaging right from day one

Version 0.0.0 signals "experimental test release - expect issues" which is honest.
Jumping to 0.1.0 implies maturity that new projects haven't earned yet.

The permanent PyPI record of 0.0.0 is a feature, not a bug:
- Documents the project's honest journey
- Users understand 0.0.0 means "very early experimental"
- Forces discipline in getting basics right early
- One workflow to learn (no TestPyPI vs PyPI confusion)

Build trust through incremental releases: 0.0.0 → 0.0.x → 0.1.0 → 1.0.0

---

## Deployment

### Hard Rules
- **Tests pass** (100% rate)
- **Linting passes** (ruff check)
- **Docker builds** successfully
- **CHANGELOG updated**
- **No debug mode** in production

### Soft Rules
- Test images locally
- Multi-stage builds for size
- Pin base image versions
- Security scans
- Health checks

---

## Docker Configuration

### Hard Rules
- **Non-root user** in production
- **No secrets in Dockerfile** - use env vars
- **Explicit EXPOSE** for ports
- **HEALTHCHECK instruction** for orchestration

### Best Practices
```dockerfile
FROM python:3.12-slim
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
RUN useradd --create-home appuser
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev
COPY src/ ./src/
USER appuser
HEALTHCHECK --interval=30s --timeout=3s CMD curl -f http://localhost:8000/health || exit 1
EXPOSE 8000
CMD ["uv", "run", "your-mcp-server"]
```

---

**End of .rules**
