Metadata-Version: 2.4
Name: chunkhound
Version: 4.1.0a1
Summary: Local-first codebase intelligence for AI assistants via MCP
Project-URL: Homepage, https://github.com/chunkhound/chunkhound
Project-URL: Repository, https://github.com/chunkhound/chunkhound
Project-URL: Issues, https://github.com/chunkhound/chunkhound/issues
Project-URL: Documentation, https://github.com/chunkhound/chunkhound#readme
Project-URL: Changelog, https://github.com/chunkhound/chunkhound/releases
Author: Ofri Wolfus
License-File: LICENSE
Keywords: AI agents,CAST algorithm,MCP,code search,semantic search,vector search
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: <3.14,>=3.10
Requires-Dist: aiohttp>=3.12.15
Requires-Dist: anthropic>=0.75.0
Requires-Dist: duckdb>=1.4.0
Requires-Dist: google-genai>=1.51.0
Requires-Dist: httpx>=0.28.1
Requires-Dist: lancedb>=0.25.3
Requires-Dist: lark>=1.1.0
Requires-Dist: loguru>=0.6.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: msgpack>=1.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: openai>=1.0.0
Requires-Dist: packaging>=21.0
Requires-Dist: pandas>=2.3.0
Requires-Dist: pathspec>=0.12.1
Requires-Dist: psutil>=5.8.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.11.0
Requires-Dist: pygit2>=1.12.0
Requires-Dist: pylance>=0.31.0
Requires-Dist: pymupdf>=1.23.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rapidyaml>=0.10.0
Requires-Dist: readchar>=4.2.1
Requires-Dist: rich>=13.0.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: tiktoken>=0.9.0
Requires-Dist: tomli>=2.0.0; python_version < '3.11'
Requires-Dist: tree-sitter-bash>=0.25.0
Requires-Dist: tree-sitter-c-sharp>=0.23.1
Requires-Dist: tree-sitter-c>=0.24.1
Requires-Dist: tree-sitter-cpp>=0.23.4
Requires-Dist: tree-sitter-elixir>=0.3.4
Requires-Dist: tree-sitter-go>=0.25.0
Requires-Dist: tree-sitter-groovy>=0.1.2
Requires-Dist: tree-sitter-haskell>=0.23.1
Requires-Dist: tree-sitter-java>=0.23.5
Requires-Dist: tree-sitter-javascript>=0.25.0
Requires-Dist: tree-sitter-json>=0.24.8
Requires-Dist: tree-sitter-kotlin>=1.1.0
Requires-Dist: tree-sitter-language-pack<1.0.0,>=0.7.3
Requires-Dist: tree-sitter-lua>=0.4.0
Requires-Dist: tree-sitter-make>=0.1.0
Requires-Dist: tree-sitter-markdown>=0.5.1
Requires-Dist: tree-sitter-php>=0.24.0
Requires-Dist: tree-sitter-python>=0.25.0
Requires-Dist: tree-sitter-rust>=0.24.0
Requires-Dist: tree-sitter-sql>=0.3.0
Requires-Dist: tree-sitter-toml>=0.7.0
Requires-Dist: tree-sitter-typescript>=0.23.2
Requires-Dist: tree-sitter-zig>=1.1.0
Requires-Dist: tree-sitter>=0.25.0
Requires-Dist: voyageai>=0.2.0
Requires-Dist: watchdog>=4.0.0
Requires-Dist: xxhash>=3.0.0
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: mypy>=1.6.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-rerunfailures>=13.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: test
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'test'
Requires-Dist: pytest-rerunfailures>=13.0; extra == 'test'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'test'
Requires-Dist: pytest>=7.4.0; extra == 'test'
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://chunkhound.github.io">
    <picture>
      <source media="(prefers-color-scheme: dark)" srcset="public/wordmark-centered-dark.svg">
      <img src="public/wordmark-centered.svg" alt="ChunkHound" width="400">
    </picture>
  </a>
</p>

<p align="center">
  <strong>Local-first codebase intelligence</strong>
</p>

<p align="center">
  <a href="https://github.com/chunkhound/chunkhound/actions/workflows/smoke-tests.yml"><img src="https://github.com/chunkhound/chunkhound/actions/workflows/smoke-tests.yml/badge.svg" alt="Tests"></a>
  <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"></a>
  <img src="https://img.shields.io/badge/100%25%20AI-Generated-ff69b4.svg" alt="100% AI Generated">
  <a href="https://discord.gg/BAepHEXXnX"><img src="https://img.shields.io/badge/Discord-Join_Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
</p>

Your AI assistant searches code but doesn't understand it. ChunkHound researches your codebase—extracting architecture, patterns, and institutional knowledge at any scale. Integrates via [MCP](https://spec.modelcontextprotocol.io/).

## Features

- **[cAST Algorithm](https://arxiv.org/pdf/2506.15655)** - Research-backed semantic code chunking
- **[Multi-Hop Semantic Search](https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search)** - Discovers interconnected code relationships beyond direct matches
- **Semantic search** - Natural language queries like "find authentication code"
- **Regex search** - Pattern matching without API keys
- **Local-first** - Your code stays on your machine
- **32 languages** with structured parsing
  - **Programming** (via [Tree-sitter](https://tree-sitter.github.io/tree-sitter/)): Python, JavaScript, TypeScript, JSX, TSX, Java, Kotlin, Groovy, C, C++, C#, Go, Rust, Haskell, Swift, Bash, MATLAB, Makefile, Objective-C, PHP, Dart, Lua, Vue, Svelte, Zig
  - **Configuration**: JSON, YAML, TOML, HCL, Markdown
  - **Text-based** (custom parsers): Text files, PDF
- **[MCP integration](https://spec.modelcontextprotocol.io/)** - Works with Claude, VS Code, Cursor, Windsurf, Zed, etc
- **Real-time indexing** - Automatic file watching, smart diffs, seamless branch switching

## Documentation

**Visit [chunkhound.github.io](https://chunkhound.github.io) for complete guides:**
- [Quickstart](https://chunkhound.github.io/quickstart/)
- [Configuration Guide](https://chunkhound.github.io/configuration/)
- [Architecture Deep Dive](https://chunkhound.github.io/under-the-hood/)

## Requirements

- Python 3.10+
- [uv package manager](https://docs.astral.sh/uv/)
- API keys (optional - regex search works without any keys)
  - **Embeddings**: [VoyageAI](https://dash.voyageai.com/) (recommended) | [OpenAI](https://platform.openai.com/api-keys) | [Local with Ollama](https://ollama.ai/)
  - **LLM (for Code Research)**: Claude Code CLI or Codex CLI (no API key needed) | [Anthropic](https://console.anthropic.com/) | [OpenAI](https://platform.openai.com/api-keys) | [Grok (xAI)](https://console.x.ai)

## Installation

```bash
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install ChunkHound
uv tool install chunkhound
```

## Quick Start

1. Create `.chunkhound.json` in project root
```json
{
  "embedding": {
    "provider": "voyageai",
    "api_key": "your-voyageai-key"
  },
  "llm": {
    "provider": "claude-code-cli"
  }
}
```
> **Note:** Use `"codex-cli"` instead if you prefer Codex. Both work equally well and require no API key.
2. Index your codebase
```bash
chunkhound index
```

**For configuration, IDE setup, and advanced usage, see the [documentation](https://chunkhound.github.io).**

## Why ChunkHound?

| Approach | Capability | Scale | Maintenance |
|----------|------------|-------|-------------|
| Keyword Search | Exact matching | Fast | None |
| Traditional RAG | Semantic search | Scales | Re-index files |
| Knowledge Graphs | Relationship queries | Expensive | Continuous sync |
| **ChunkHound** | Semantic + Regex + Code Research | Automatic | Incremental + realtime |

**Ideal for:**
- Large monorepos with cross-team dependencies
- Security-sensitive codebases (local-only, no cloud)
- Multi-language projects needing consistent search
- Offline/air-gapped development environments

## License

MIT
