Metadata-Version: 2.4
Name: code-finder
Version: 0.1.1
Summary: Code evidence retrieval and grounded review for documentation workflows. AST chunking, hybrid search (BM25 + vector), and API surface extraction.
License-Expression: Apache-2.0
Keywords: documentation,code analysis,code evidence,semantic search,ast,embeddings
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pymilvus>=2.3.0
Requires-Dist: milvus-lite>=2.3.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: rank-bm25
Requires-Dist: numpy>=1.24.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: tree-sitter
Requires-Dist: tree-sitter-python
Requires-Dist: tree-sitter-javascript
Requires-Dist: tree-sitter-typescript
Requires-Dist: tree-sitter-go
Provides-Extra: synthesis
Requires-Dist: anthropic>=0.34.0; extra == "synthesis"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff; extra == "dev"

# code-finder

AST-based code indexing and hybrid search (BM25 + vector) for retrieving code evidence from repositories. Built to answer natural-language questions about a codebase with ranked, source-grounded results.

> **Import name**: The package installs as `code-finder` but the Python import is `claude_context`, not `code_finder`.

## Install

```bash
pip install code-finder
```

Or run ephemerally without installing:

```bash
uv run --with code-finder code-finder-evidence --repo /path/to/repo --query "how does auth work?"
```

## What it does

code-finder parses source code into AST-aware chunks, embeds them with a local sentence-transformer model, and stores them in a Milvus Lite vector database. At query time it combines BM25 keyword search with vector similarity search (reciprocal rank fusion) to return the most relevant code snippets for a natural-language question.

Three capabilities are exposed as both CLI commands and Python functions:

| Capability | CLI command | What it returns |
|---|---|---|
| **Code evidence retrieval** | `code-finder-evidence` | Ranked code snippets matching a query |
| **Code-grounded review** | `code-finder-review` | Per-claim verdicts for a draft document |
| **API surface extraction** | `code-finder-api-surface` | Public classes, functions, and signatures |

## CLI usage

### Code evidence retrieval

Search a repo with a natural-language question:

```bash
code-finder-evidence \
  --repo /path/to/repo \
  --query "how does authentication work?" \
  --limit 5
```

Filter by chunk type or file path:

```bash
code-finder-evidence \
  --repo /path/to/repo \
  --query "error handling" \
  --filter-types function,method \
  --filter-paths src/auth,src/config
```

Force a re-index after code changes:

```bash
code-finder-evidence --repo /path/to/repo --query "config loading" --reindex
```

### Code-grounded review

Validate a draft document's factual claims against the source code:

```bash
code-finder-review \
  --repo /path/to/repo \
  --draft docs/getting-started.md
```

Each claim gets a verdict: `supported`, `partially_supported`, `unsupported`, or `no_evidence_found`.

### API surface extraction

Extract the public API from source files. This is deterministic (no LLM, no indexing):

```bash
code-finder-api-surface --target src/mypackage/

# Single file
code-finder-api-surface --target src/mypackage/client.py

# Include private members
code-finder-api-surface --target src/mypackage/ --include-private
```

## Python API

```python
from claude_context.skills.evidence_retrieval import retrieve_evidence

results = retrieve_evidence(
    repo_path="/path/to/repo",
    query="how does hybrid search combine BM25 and vector results?",
    limit=10,
    filter_types=["function", "method"],
    filter_paths=["src/auth", "src/config"],
)

for r in results:
    print(f"{r['file_path']}:{r['start_line']} ({r['combined_score']:.3f})")
    print(f"  {r['signature']}")
```

```python
from claude_context.skills.grounded_review import grounded_review

report = grounded_review(
    repo_path="/path/to/repo",
    draft_path="docs/getting-started.md",
    max_evidence_per_claim=5,
)
```

```python
from claude_context.skills.api_surface import extract_api_surface

surface = extract_api_surface(
    target_path="src/mypackage/",
    languages=["python"],
    include_private=False,
    include_docstrings=True,
)
```

## Index caching

On first run, code-finder builds an index of the repository (AST chunking + embeddings). This takes 1-3 minutes depending on repo size. The index is cached at:

```
{repo}/.vibe2doc/index.db
```

Subsequent runs reuse the cached index. Pass `--reindex` (CLI) or `reindex=True` (Python) after significant code changes. API surface extraction does not use the index.

## Filtering

### Path filtering

Restrict results to specific directories using `--filter-paths` (CLI) or `filter_paths` (Python). Paths are relative to the repo root:

```bash
code-finder-evidence --repo /path/to/repo --query "auth" --filter-paths src/auth,src/middleware
```

### Type filtering

Restrict to specific chunk types: `function`, `method`, `class`, `module`, `import`, `decorator`.

### Language filtering

Restrict to specific languages: `python`, `javascript`, `typescript`, `go`, and others.

## Supported languages

Python, JavaScript, TypeScript, Go (AST-parsed via tree-sitter). Additionally indexes Markdown, JSON, YAML, TOML, HTML, CSS, shell scripts, SQL, and other text formats.

## Used by

[redhat-docs-agent-tools](https://gitlab.cee.redhat.com/ccs-internal-tools/redhat-docs-agent-tools) uses code-finder as the backend for its `code-evidence`, `grounded-review`, and `api-surface` skills. If you're using those skills, code-finder is installed automatically as a dependency.

## Origin

code-finder was built from a fork of [claude-context](https://github.com/zilliztech/claude-context/) by Zilliz, which provides Milvus-backed code search for Claude. It was extended within [vibe2doc](https://gitlab.cee.redhat.com/dobrenna/vibe2doc) with enhanced AST chunking, path filtering, grounded review, and API surface extraction, then extracted as a standalone package. The vibe2doc README describes the full doc generation workflow; this package provides only the code analysis and search layer.

## License

Apache-2.0
