Metadata-Version: 2.4
Name: codeseek
Version: 0.1.0
Summary: Semantic code search for your repo, as a CLI and an MCP server. Bring any OpenAI-compatible embedding model. Zero dependencies.
Project-URL: Homepage, https://github.com/Sev7nOfNine/codeseek
Project-URL: Repository, https://github.com/Sev7nOfNine/codeseek
Project-URL: Issues, https://github.com/Sev7nOfNine/codeseek/issues
Author: Seven Of Nine
License: MIT
License-File: LICENSE
Keywords: cli,code-search,developer-tools,embeddings,llm,mcp,model-context-protocol,rag,semantic-search
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Provides-Extra: test
Requires-Dist: pytest>=7; extra == 'test'
Description-Content-Type: text/markdown

# codeseek

[![PyPI](https://img.shields.io/pypi/v/codeseek.svg)](https://pypi.org/project/codeseek/)
[![CI](https://github.com/Sev7nOfNine/codeseek/actions/workflows/ci.yml/badge.svg)](https://github.com/Sev7nOfNine/codeseek/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python](https://img.shields.io/pypi/pyversions/codeseek.svg)](https://pypi.org/project/codeseek/)

**Semantic search over your codebase — as a CLI and an MCP server. Zero dependencies.**

`codeseek` indexes a repository into a local vector store and lets you search it
by meaning, not just by string match. Use it from the terminal, or run it as an
[MCP](https://modelcontextprotocol.io) server so an AI coding assistant or editor
can ask your codebase questions directly.

It brings no embedding model of its own: point it at the OpenAI API, or at any
OpenAI-compatible endpoint such as a local `llama.cpp` server, so private code
can be embedded without leaving your machine. Storage is plain SQLite; search is
brute-force cosine. The whole thing is the Python standard library and nothing
else.

## Install

```bash
pip install codeseek
# or:
pipx install codeseek
```

Requires Python 3.8+.

## Quick start

```bash
export OPENAI_API_KEY=sk-...

# 1. Index the current repository
codeseek index .

# 2. Search it
codeseek search "where do we validate the auth token?"
codeseek search "retry with backoff" -k 3
```

Results come back as markdown, each with a file path, line range, and similarity
score:

```markdown
### src/auth/token.py:40-70  (score 0.812)
​```
def verify_token(raw: str) -> Claims:
    ...
​```
```

### Use a local or alternative provider

```bash
codeseek index . --base-url http://localhost:8080/v1 --model nomic-embed-text
```

## As an MCP server

`codeseek serve` speaks MCP over stdio and exposes one tool, `search_code`.

After indexing a repo, register it with your MCP client. A typical `mcpServers`
configuration looks like this:

```json
{
  "mcpServers": {
    "codeseek": {
      "command": "codeseek",
      "args": ["serve", "--db", "/path/to/your/repo/.codeseek.db"],
      "env": { "OPENAI_API_KEY": "sk-..." }
    }
  }
}
```

The assistant can then call `search_code` to pull relevant code into its context
on demand, instead of you pasting files by hand.

## Commands

| Command | What it does |
| --- | --- |
| `codeseek index [PATH]` | Index a directory (default `.`) into `--db`. |
| `codeseek search QUERY` | Search the index; `-k` sets result count. |
| `codeseek serve` | Run the MCP server over stdio. |

Shared options: `--db`, `--model`, `--base-url`, `--api-key` (each with an
environment-variable default).

## How it works

1. **Source** — files are walked and read (sensible code/text extensions, common
   build and vendor directories skipped).
2. **Chunking** — each file is split into overlapping line windows.
3. **Embedding** — chunks are embedded in batches via your provider.
4. **Storage** — vectors land in a local SQLite database.
5. **Search** — your query is embedded and compared against every chunk by cosine
   similarity; the top matches are returned.

The document source is pluggable: the engine only consumes `Document` objects, so
the same indexing and search machinery can be pointed at things other than code.

## Privacy note

Indexing sends file contents to whichever embeddings provider you configure. For
private code, prefer a self-hosted model via `--base-url`.

## Scope

Search is a linear scan, which is plenty fast for a single repository (a few
thousand chunks). Indexing very large monorepos would want a real approximate
vector index — a natural next step, not today's goal.

## Development

```bash
pip install -e ".[test]"
python -m pytest
```

All tests run offline; the embedding and HTTP layers accept injectable fakes.

## License

MIT — see [LICENSE](LICENSE).
