Metadata-Version: 2.4
Name: civyk-repoix
Version: 0.3.0
Summary: Civyk Repo Index - Codebase indexing service for AI coding agents
Author: Civyk
License: MIT
Project-URL: Homepage, https://github.com/civyk/civyk-repoix
Project-URL: Documentation, https://github.com/civyk/civyk-repoix#readme
Project-URL: Repository, https://github.com/civyk/civyk-repoix
Project-URL: Issues, https://github.com/civyk/civyk-repoix/issues
Keywords: mcp,codebase,indexing,ai,coding-agents
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: C
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pathspec>=0.12.0
Requires-Dist: tree-sitter>=0.21.0
Requires-Dist: tree-sitter-python>=0.21.0
Requires-Dist: tree-sitter-javascript>=0.21.0
Requires-Dist: tree-sitter-typescript>=0.21.0
Requires-Dist: tree-sitter-java>=0.21.0
Requires-Dist: tree-sitter-go>=0.21.0
Requires-Dist: tree-sitter-c-sharp>=0.21.0
Requires-Dist: tree-sitter-rust>=0.21.0
Requires-Dist: tree-sitter-ruby>=0.21.0
Requires-Dist: tree-sitter-php>=0.23.0
Requires-Dist: tree-sitter-sql>=0.3.0
Requires-Dist: tree-sitter-markdown>=0.3.0
Requires-Dist: tree-sitter-json>=0.24.0
Requires-Dist: tree-sitter-yaml>=0.6.0
Requires-Dist: tree-sitter-toml>=0.6.0
Requires-Dist: tree-sitter-xml>=0.6.0
Requires-Dist: watchdog>=4.0.0
Requires-Dist: aiosqlite>=0.19.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: gitpython>=3.1.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: mypy>=1.8.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: types-PyYAML>=6.0.0; extra == "dev"
Requires-Dist: pyinstaller>=6.0.0; extra == "dev"
Requires-Dist: debugpy>=1.8.0; extra == "dev"
Requires-Dist: nuitka>=2.0.0; extra == "dev"
Requires-Dist: ordered-set>=4.1.0; extra == "dev"

# Civyk Repo Index

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-purple.svg)](https://modelcontextprotocol.io/)

**Semantic code intelligence for AI coding agents** — Give your AI assistant deep understanding of your codebase through the Model Context Protocol (MCP).

> **If you find this useful, please consider giving it a star on GitHub! It helps others discover the project.**

______________________________________________________________________

## Local-First, Private, Secure

**Your code never leaves your machine.** Civyk Repo Index is a fully local MCP server:

- **100% offline** — No cloud services, no API calls, no telemetry
- **Your data stays yours** — All indexes and caches stored locally in SQLite
- **Works air-gapped** — Perfect for proprietary codebases and enterprise environments
- **Open source** — Audit the code yourself, MIT licensed

______________________________________________________________________

## Why Civyk Repo Index?

AI coding assistants have **limited context windows**. They can't read entire codebases. Civyk Repo Index provides **token-budgeted semantic code intelligence**:

- 🔍 **Symbol-aware search** — Find functions, classes, and types instantly
- 📦 **Smart context packs** — Auto-select relevant code within token budgets
- 🔗 **Relationship tracking** — Understand calls, imports, and inheritance
- ⚡ **Real-time indexing** — Always up-to-date with your code changes
- 🌳 **Multi-language** — Python, TypeScript, JavaScript, Java, Go, C#, Rust, Ruby, PHP
- 🔀 **Branch-aware** — Separate indexes per git branch
- 🧠 **AI Context Cache** — Persist code understanding across sessions, save 80-90% tokens

______________________________________________________________________

## Quick Start

### Installation

```bash
pip install civyk-repoix
```

### Setup for Your AI Agent

```bash
cd /path/to/your/project

# Interactive setup
civyk-repoix setup

# Or configure specific agent
civyk-repoix setup --ai claude        # Claude Code
civyk-repoix setup --ai cursor-agent  # Cursor
civyk-repoix setup --ai windsurf      # Windsurf
civyk-repoix setup --ai copilot       # GitHub Copilot
```

### Verify

```bash
civyk-repoix query index-status
```

______________________________________________________________________

## MCP Tools

36 tools for code intelligence. [Full reference →](docs/mcp-tools.md)

| Category | Tools |
|----------|-------|
| **Core** | `index_status`, `build_context_pack`, `search_symbols`, `get_symbol`, `get_references`, `get_components`, `get_api_endpoints`, `get_dependencies`, `force_reindex` |
| **Navigation** | `get_file_symbols`, `get_definition`, `get_callers` |
| **Discovery** | `list_files`, `get_file_imports`, `search_code` |
| **Git** | `get_recent_changes`, `get_hotspots`, `get_branch_diff` |
| **Analysis** | `get_dead_code`, `find_circular_dependencies`, `analyze_impact`, `get_tests_for`, `get_code_for_test`, `get_duplicate_code`, `get_tool_performance_stats` |
| **Advanced** | `get_type_hierarchy`, `get_related_files`, `find_similar` |
| **AI Cache** | `store_understanding`, `recall_understanding`, `get_understanding_stats`, `invalidate_understanding` |
| **Context** | `build_delta_context_pack`, `map_trace_to_symbols`, `get_recommended_tests`, `build_doc_pack` |

> **Tip:** Use `recall_understanding` before reading files — cached analysis saves 80-90% of tokens.

______________________________________________________________________

## AI Context Cache — Your AI Remembers

**The killer feature: Your AI assistant remembers what it learned about your code.**

Traditional AI coding assistants forget everything when you start a new chat or session. With Civyk Repo Index's AI Context Cache, understanding persists:

```text
Monday:    AI reads auth.py → analyzes → stores understanding
Tuesday:   New chat → AI recalls cached understanding → no file read needed!
Wednesday: You modify auth.py → cache auto-invalidates → AI re-analyzes
```

### Why This Matters

| Without Cache | With Cache |
|---------------|------------|
| AI re-reads files every session | AI recalls previous analysis instantly |
| Wastes tokens on repeated reads | **80-90% token savings** |
| Slow context building | Sub-millisecond recall |
| Understanding lost on chat restart | **Persists across sessions and chats** |

### How It Works

1. **First encounter**: AI reads a file, analyzes it, calls `store_understanding`
1. **Future sessions**: AI calls `recall_understanding` first — gets cached analysis
1. **File changes**: Cache auto-invalidates via content hash — AI re-analyzes
1. **Per-repository**: Each repo has its own persistent cache

### Cache Tools

| Tool | Purpose |
|------|---------|
| `recall_understanding` | **Call FIRST** before reading any file — retrieves cached analysis |
| `store_understanding` | Persist AI's analyzed understanding after reading files |
| `get_understanding_stats` | **Session start**: List cached targets, filter by path/scope, sort, check freshness |
| `invalidate_understanding` | Manually clear cached entries when needed |

`store_understanding` supports structured fields (purpose, key_points, gotchas) plus a free-form `analysis` field for complex business logic, state machines, and workflows.

> **Pro tip**: The AI Context Cache is stored locally in SQLite alongside your code index. Your analysis never leaves your machine.

______________________________________________________________________

## Language Support

| Tier | Languages |
|------|-----------|
| **Full** | Python, TypeScript, JavaScript |
| **Standard** | Java, Go, C#, Rust, Ruby, PHP |
| **SQL** | T-SQL, PL/SQL, Standard SQL |
| **Docs** | Markdown |

______________________________________________________________________

## Architecture

Daemon-based architecture for multi-repository support. [Full details →](docs/architecture.md)

```mermaid
graph LR
    IDE[IDE] --> Shim[stdio Shim] --> Daemon[Daemon Manager] --> Workers[Repository Workers] --> DB[(SQLite)]
```

**Key Components:**

- **Daemon Manager** — Coordinates worker lifecycle
- **Repository Worker** — One per repo, handles indexing and queries
- **Indexer** — Tree-sitter parsing, symbol extraction
- **Context Builder** — Token-budgeted context generation

______________________________________________________________________

## CLI Mode

Use without MCP protocol:

```bash
civyk-repoix query search-symbols --query "%User%" --kind class
civyk-repoix query build-context-pack --task "implement auth" --token-budget 1000
civyk-repoix query --schema  # Get JSON schema of all tools
```

______________________________________________________________________

## Configuration

Location: `~/.config/civyk-repoix/config.yaml`

```yaml
index:
  max_file_size_mb: 10
  debounce_ms: 500

daemon:
  max_workers: 10
  idle_worker_timeout_s: 3600

context:
  default_token_budget: 800
  max_token_budget: 4000
```

**Environment Variables:**

| Variable | Default | Description |
|----------|---------|-------------|
| `CIVYK_LOG_LEVEL` | INFO | Log level |
| `REPOIX_PARSE_WORKERS` | CPU count | Parallel parsing workers |
| `REPOIX_CACHE_TTL` | 60 | Query cache TTL (seconds) |

______________________________________________________________________

## Performance

| Operation | Performance |
|-----------|-------------|
| Symbol search | < 1ms |
| Build context pack | < 10ms |
| Full index (1K files) | 5-10s |
| Delta index | < 1s |

______________________________________________________________________

## Documentation

- [MCP Tools Reference](docs/mcp-tools.md) — All 36 tools with examples
- [Architecture](docs/architecture.md) — System design and components
- [Quickstart Guide](docs/quickstart.md) — Developer setup
- [Specification](specs/001-repo-index/spec.md) — Full requirements

______________________________________________________________________

## Code Structure

The project follows a modular architecture with clear separation of concerns:

```text
src/civyk_repoix/
├── cli.py              # Command-line interface entry point
├── cli_tools.py        # CLI tool implementations for query subcommands
├── config.py           # Configuration dataclasses and loading
├── service.py          # Standalone service lifecycle management
├── mcp_server.py       # MCP protocol server (standalone mode)
├── mcp_responses.py    # Response builders (single source of truth)
├── tool_handlers.py    # Shared MCP tool handler implementations
├── tool_schemas.py     # Tool parameter schemas for CLI/MCP
├── tool_descriptions.py# AI-agent-friendly tool descriptions
├── exceptions.py       # Custom exceptions
├── logging_setup.py    # Logging configuration
│
├── daemon/             # Daemon architecture for multi-repo support
│   ├── manager.py      # DaemonManager - coordinates worker pool
│   ├── worker.py       # RepoWorker - per-repository handler
│   └── shim.py         # stdio-to-socket bridge for MCP
│
├── workers/            # Core processing workers
│   ├── indexer.py      # File indexer with Tree-sitter parsing
│   └── context_builder.py # Token-budgeted context generation
│
├── storage/            # Data persistence layer
│   ├── database.py     # SQLite wrapper with WAL mode
│   ├── repository.py   # High-level data access pattern
│   └── cache.py        # Query and context pack caching
│
├── engines/            # Language parsing
│   ├── treesitter.py   # Tree-sitter parsing engine
│   └── languages.py    # Language detection and grammar loading
│
├── models/             # Domain entities (immutable dataclasses)
│   ├── symbol.py       # Code symbols (classes, functions, etc.)
│   ├── file.py         # Source files with metadata
│   ├── edge.py         # Symbol relationships
│   ├── component.py    # Architectural components
│   ├── status.py       # Health status
│   └── ai_understanding.py # AI analysis cache
│
├── monitors/           # Change detection
│   ├── file_watcher.py # File system monitoring
│   ├── git_watcher.py  # Git branch detection
│   └── parent_watcher.py # IDE process monitoring
│
├── transport/          # Communication layer
│   ├── protocol.py     # Framed JSON-RPC protocol
│   └── socket.py       # Socket server/client
│
├── security/           # Security utilities
│   └── scanner.py      # Secret detection and redaction
│
├── setup/              # Agent configuration
│   ├── config.py       # Agent registry
│   └── mcp_setup.py    # MCP server setup
│
└── utils/              # Shared utilities
    ├── git_utils.py    # Git operations
    ├── path_filter.py  # Path filtering for indexer
    ├── language_filter.py # Source code detection
    └── timing.py       # Performance metrics
```

### Key Patterns

- **Response Builders**: All MCP responses defined in `mcp_responses.py`
- **Handler Delegation**: Tool logic in `tool_handlers.py`, called by both daemon and standalone modes
- **Immutable Models**: Domain entities use `@dataclass(frozen=True, slots=True)`
- **Async Architecture**: Core operations use `asyncio` with thread pools for CPU-bound work
- **Single Source of Truth**: Tool schemas, descriptions, and responses each in one module

______________________________________________________________________

## Contributing

```bash
git clone https://github.com/civyk/civyk-repoix.git
cd civyk-repoix
pip install -e ".[dev]"
pytest -n auto
```

**Guidelines:** Use `ruff format`, add type hints, maintain >80% test coverage.

______________________________________________________________________

## License

MIT — see [LICENSE](LICENSE)
