Metadata-Version: 2.4
Name: terrain
Version: 2.0.0
Summary: Give your AI a complete map of any codebase — function signatures, call graphs, and semantic search
License: MIT
Requires-Python: >=3.10
Requires-Dist: diff-match-patch>=20230430
Requires-Dist: httpx>=0.25
Requires-Dist: kuzu>=0.4
Requires-Dist: loguru>=0.7
Requires-Dist: mcp>=1.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: requests>=2.28
Requires-Dist: tree-sitter-c
Requires-Dist: tree-sitter-cpp
Requires-Dist: tree-sitter-javascript
Requires-Dist: tree-sitter-python
Requires-Dist: tree-sitter-typescript
Requires-Dist: tree-sitter>=0.22
Provides-Extra: rag
Provides-Extra: semantic
Provides-Extra: treesitter-c
Provides-Extra: treesitter-full
Requires-Dist: tree-sitter-go; extra == 'treesitter-full'
Requires-Dist: tree-sitter-java; extra == 'treesitter-full'
Requires-Dist: tree-sitter-lua; extra == 'treesitter-full'
Requires-Dist: tree-sitter-rust; extra == 'treesitter-full'
Requires-Dist: tree-sitter-scala; extra == 'treesitter-full'
Description-Content-Type: text/markdown

# Terrain

English | [Chinese / CN](README_CN.md)

Give your AI coding assistant a complete map of any codebase — function signatures, call graphs, and semantic search across every line of code.

## The Problem

You drop a 500,000-line codebase in front of Claude Code. It reads what it can see. It guesses what it can't. You get answers that are *almost* right.

Terrain indexes the entire codebase once, then gives your AI a precise, queryable knowledge graph — so it stops guessing.

## What This Looks Like

Ask Claude Code about an unfamiliar codebase:

> "How does the authentication token get refreshed?"

Without Terrain, the AI skims files and makes educated guesses — possibly missing the real implementation buried three call levels deep.

With Terrain:

```
find_api("authentication token refresh")

→ refresh_access_token() in auth/token_manager.c:187
  Signature: int refresh_access_token(TokenCtx *ctx, const char *refresh_token)
  Called by: session_heartbeat() → event_loop_tick() → main()
  Calls:     http_post(), parse_jwt(), update_session_store()
```

Precise. Complete. Instant.

## Quick Start

```bash
npx terrain@latest --setup
```

The setup wizard installs the Python package, configures your LLM and embedding provider, and registers Terrain as a global MCP server for Claude Code. One command.

Then in Claude Code — point it at any repo and ask anything.

## Index a Codebase

```bash
terrain index /path/to/your/repo
```

Takes a few minutes the first time. Incremental updates after that:

```bash
terrain index -i   # git-diff based, fast
```

## What You Can Ask

| You want to know... | What to ask |
|---|---|
| Where does X get initialized? | "find where X is initialized" |
| What calls this function? | "find callers of function\_name" |
| How does feature Y work end-to-end? | "trace the call chain for Y" |
| What functions handle Z? | "find Z handler" |

## Supported Languages

C/C++, Python, JavaScript/TypeScript, Rust, Go, Java, Scala, C#, PHP, Lua

---

## Reference

### Install

```bash
# Recommended
npx terrain@latest --setup

# Or via pip
pip install terrain
terrain-mcp  # Start MCP server
```

### Uninstall

```bash
npx terrain@latest --uninstall
```

Removes: Claude MCP registration, Python package, workspace data.

### MCP Client Configuration

Add to your MCP client config (Claude Code, Cursor, Windsurf, etc.):

```json
{
  "mcpServers": {
    "terrain": {
      "command": "npx",
      "args": ["-y", "terrain@latest", "--server"]
    }
  }
}
```

On Windows:

```json
{
  "mcpServers": {
    "terrain": {
      "command": "cmd",
      "args": ["/c", "npx", "-y", "terrain@latest", "--server"]
    }
  }
}
```

### CLI Tool (\`terrain\`)

#### Workspace

```bash
terrain status              # Show active repository, workspace, LLM & embedding info
terrain list                # List all indexed repositories
terrain repo                # Interactively switch active repository
terrain config              # Interactive configuration wizard (LLM, embedding, workspace)
terrain link <path>         # Link a local repo to shared pre-built artifacts
terrain link <path> --db x  # Link to a specific artifact directory
```

#### Indexing

```bash
terrain index               # Index current directory (graph → api-docs → embeddings)
terrain index /path/to/repo # Index a specific path
terrain index -i            # Incremental update (git-diff based, fast)
terrain index --no-embed    # Skip embedding generation
terrain index --no-wiki     # Skip wiki generation only
```

#### Rebuild & Clean

```bash
terrain rebuild             # Rebuild all steps for active repository
terrain rebuild --step graph   # Rebuild only the graph
terrain rebuild --step api     # Rebuild only API docs
terrain rebuild --step embed   # Rebuild only embeddings
terrain rebuild --step wiki    # Rebuild only wiki

terrain clean               # Remove indexed data (interactive)
terrain clean repo_name     # Remove specific repository
terrain clean --all         # Remove all indexed repositories
```

#### Low-Level Commands

```bash
terrain scan /path          # Scan repo and build knowledge graph
  --backend kuzu|memgraph|memory
  --db-path ./graph.db
  --exclude "vendor,build"
  --language "c,python"
  --clean               # Clean DB before scanning
  -o graph.json         # Export graph to JSON

terrain query "MATCH (f:Function) RETURN f.name LIMIT 10"
  --format table|json

terrain export /path -o graph.json
  --build               # Build graph before exporting

terrain stats               # Show graph statistics (nodes, relationships)
```

#### Global Flags

```bash
terrain --version           # Show version
terrain -v ...              # Verbose/debug output
terrain --help              # Show help
```

### MCP Tools

**Core workflow for AI agents:** `initialize_repository` → `find_api` → `get_api_doc`

#### Repository Management

| Tool | Description |
|---|---|
| `initialize_repository` | Index a repo: graph + API docs + embeddings |
| `get_repository_info` | Active repo stats (node/relationship counts, service status) |
| `list_repositories` | All indexed repos with pipeline completion status |
| `switch_repository` | Switch active repo for queries |
| `link_repository` | Reuse existing index for a different repo path (no re-indexing) |

#### Code Search & Documentation

| Tool | Description |
|---|---|
| `find_api` | Hybrid semantic + keyword search with API doc (primary search tool) |
| `list_api_docs` | Browse L1 module index or L2 module details |
| `get_api_doc` | L3 function detail: signature, call tree, usage examples, source |
| `generate_api_docs` | Generate/update API docs (full / resume / enhance) |

#### Call Graph Analysis

| Tool | Description |
|---|---|
| `find_callers` | Find all functions that call a specific function (no LLM required) |
| `trace_call_chain` | BFS upward call chain trace with entry point discovery |

#### Configuration & Maintenance

| Tool | Description |
|---|---|
| `get_config` | Show server configuration and service availability |
| `rebuild_embeddings` | Build or rebuild vector embeddings |

### Pipeline

| Step | What | Input | Output |
|---|---|---|---|
| 1. graph-build | Tree-sitter AST parsing | Source code | Kuzu graph database |
| 2. api-doc-gen | Query graph, render docs | Graph | 3-level Markdown (index / module / function) |
| 2b. desc-gen | LLM generates descriptions | Functions without docstrings | Descriptions in L3 Markdown |
| 3. embed-gen | Vectorize function docs | L3 Markdown files | Vector store (pickle) |

```
initialize_repository  ->  Steps 1-3 (full pipeline)
build_graph            ->  Step 1 only
generate_api_docs      ->  Step 2 + 2b (modes: full / resume / enhance)
rebuild_embeddings     ->  Step 3
generate_wiki          ->  Separate (not in main pipeline)
```

### API Documentation Format

Generated docs are optimized for both AI agent reading and vector retrieval.

#### L3 Function Detail (embedding unit)

```markdown
# parse_btype

> Parse base type declaration including struct/union/enum specifiers.

- Signature: `int parse_btype(CType *type, AttributeDef *ad, int ignore_label)`
- Return: `int`
- Visibility: static | Header: tccgen.h
- Location: tccgen.c:139-280
- Module: tinycc.tccgen --C code generator

## Call Tree

parse_btype
|-- expr_const           [static]
|-- parse_btype_qualify   [static]
|-- struct_decl           [static]
|   |-- expect
|   `-- next
`-- parse_attribute       [static]

## Called by (5)

- type_decl (tinycc.tccgen) -> tccgen.c:1200
- post_type (tinycc.tccgen) -> tccgen.c:1350
```

#### C/C++ Specific Features

- Extracts `//` and `/* */` comments above functions as descriptions
- Struct/union/enum members displayed with types
- Macro definitions in dedicated section
- Static/public/extern visibility classification
- Memory ownership inference from signatures
- Header/implementation file split
- Cross-file function call resolution via `#include` header mapping
- Function pointer tracking and indirect call resolution
- GB2312/GBK encoding support for source files

### Supported Languages (detail)

| Language | Functions | Classes/Structs | Calls | Imports | Types |
|---|---|---|---|---|---|
| C / C++ | Yes | struct, union, enum, typedef, macro | Yes | #include | Yes |
| Python | Yes | Yes | Yes | Yes | - |
| JavaScript / TypeScript | Yes | Yes | Yes | Yes | - |
| Rust | Yes | struct, enum, trait, impl | Yes | Yes | - |
| Go | Yes | struct, interface | Yes | Yes | - |
| Java | Yes | class, interface, enum | Yes | Yes | - |
| Scala | Yes | class, object | Yes | Yes | - |
| C# | Yes | class, namespace | Yes | - | - |
| PHP | Yes | class | Yes | - | - |
| Lua | Yes | - | Yes | - | - |

### Graph Schema

**Nodes**: `Project`, `Package`, `Module`, `File`, `Folder`, `Class`, `Function`, `Method`, `Type`, `Enum`, `Union`

**Relationships**: `CONTAINS_*`, `DEFINES`, `DEFINES_METHOD`, `CALLS`, `INHERITS`, `IMPLEMENTS`, `IMPORTS`, `OVERRIDES`

**Properties**: `qualified_name` (PK), `name`, `path`, `start_line`, `end_line`, `signature`, `return_type`, `visibility`, `parameters`, `kind`, `docstring`

### Architecture

The project follows a 5-layer harness architecture:

```
L4  entrypoints/         MCP server, CLI
L3  domains/upper/       apidoc, rag, guidance, calltrace
L2  domains/core/        graph, embedding, search
L1  foundation/          parsers, services, utils
L0  foundation/types/    constants, models, type definitions
```

### Environment Variables

#### LLM (first match wins)

| Variable | Purpose | Default |
|---|---|---|
| `LLM_API_KEY` | Generic LLM key (highest priority) | - |
| `LLM_BASE_URL` | API endpoint | `https://api.openai.com/v1` |
| `LLM_MODEL` | Model name | `gpt-4o` |
| `OPENAI_API_KEY` | OpenAI or compatible | - |
| `MOONSHOT_API_KEY` | Moonshot / Kimi (legacy) | - |

#### Embedding

| Variable | Purpose | Default |
|---|---|---|
| `DASHSCOPE_API_KEY` | DashScope (Qwen3 Embedding) | - |
| `DASHSCOPE_BASE_URL` | DashScope endpoint | `https://dashscope.aliyuncs.com/api/v1` |

#### System

| Variable | Purpose | Default |
|---|---|---|
| `TERRAIN_WORKSPACE` | Workspace directory | `~/.terrain` |

### Installation Options

#### Install from PyPI

```bash
# Core (includes C/C++, Python, JS/TS grammars)
pip install terrain

# With all language grammars (Rust, Go, Java, Scala, Lua)
pip install "terrain[treesitter-full]"
```

#### Install from local source

```bash
git clone https://github.com/JeremyJiao01/CodeGraphWiki.git
cd CodeGraphWiki

# Install with all language grammars
pip install ".[treesitter-full]"

# Or install in editable mode for development
pip install -e ".[treesitter-full]"
```

#### Build and install from wheel

```bash
git clone https://github.com/JeremyJiao01/CodeGraphWiki.git
cd CodeGraphWiki

python3 -m build
pip install dist/terrain-*.whl
```

### Development

```bash
git clone https://github.com/JeremyJiao01/CodeGraphWiki.git
cd CodeGraphWiki
pip install -e ".[treesitter-full]"

python3 -m pytest tests/ -v

# Integration tests (requires tinycc repo at ../tinycc)
python3 -m pytest tests/domains/core/test_graph_build.py -v      # ~3 min
python3 -m pytest tests/domains/upper/test_api_docs.py -v        # ~3 min
python3 -m pytest tests/domains/core/test_step3_embedding.py -v  # ~27 min (API calls)
python3 -m pytest tests/domains/upper/test_api_find_integration.py -v  # ~47 min (full pipeline)
```

## License

MIT
