Metadata-Version: 2.4
Name: grounded-index
Version: 0.5.0
Summary: Offline source code indexer with token-bounded context queries
Author: Luiz Spies
License: GPL-3.0-only
License-File: LICENSE
Keywords: cli,code-index,sqlite,static-analysis
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Requires-Dist: tree-sitter-c>=0.24
Requires-Dist: tree-sitter-cpp>=0.23
Requires-Dist: tree-sitter-java>=0.23
Requires-Dist: tree-sitter-javascript>=0.25
Requires-Dist: tree-sitter-python>=0.25
Requires-Dist: tree-sitter-rust>=0.24
Requires-Dist: tree-sitter-typescript>=0.23
Requires-Dist: tree-sitter>=0.25
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Provides-Extra: external-cfg
Requires-Dist: pydot>=3; extra == 'external-cfg'
Provides-Extra: test
Requires-Dist: pytest>=8; extra == 'test'
Description-Content-Type: text/markdown

# grounded-index

[![Version](https://img.shields.io/badge/version-0.5.0-blue.svg)](CHANGELOG.md)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](pyproject.toml)
[![License: GPL-3.0](https://img.shields.io/badge/license-GPL--3.0-blue.svg)](LICENSE)

**Offline source code indexer with token-bounded context queries.**

`grounded-index` walks a repository, parses sources with `tree-sitter`, and writes
files, symbols, imports, and references to a SQLite database. From there it answers
focused questions — *what symbols exist, who calls them, which tests reference them,
what does this look like in 4 000 tokens* — without your tools ever needing to read
the full repository.

It is designed for AI coding agents, code-review tooling, and bank-friendly review
workflows: no network calls, no model calls in the indexing path, no source mutation.

---

## Why

Language-model coding assistants waste tokens (and accuracy) when they have to `cat` whole
files to discover basic repository structure. `grounded-index` exposes the same
evidence a developer uses to navigate — symbols, callers, tests, imports — as
compact CLI output and a small Python API.

## Features

- **Seven languages on tree-sitter**: Python, Rust, TypeScript, JavaScript, Java, C, C++.
- **Symbols, imports, references** extracted per language with stable line/col spans.
- **Test detection** (`is_test` column) by path heuristics, decorators / attributes,
  and naming conventions — `pytest` `test_*` / `Test*`, Rust `#[test]` / `#[cfg(test)]`,
  Jest `*.test.ts` / `*.spec.ts`, Java `@Test` annotations, C/C++ `_test.c` suffix.
- **SQLite schema** with FTS5 symbol search and proper indexes.
- **Token-bounded context packs** for AI prompts (`context` command + `BudgetEnforcer`).
- **JSON / Markdown / human output** for each command.
- **Read-only by default**; indexing requires the explicit `--write` flag.

## Install

```bash
pip install grounded-index
```

Or from a local checkout:

```bash
git clone <repo> grounded-index
cd grounded-index
pip install -e .
```

## 30-second example

```bash
# Index this repository
grounded-index --write index

# List symbols matching a name
grounded-index symbols --name parse

# Show who calls a symbol
grounded-index references --symbol parse_references --direction in

# Build a 4 000-token context pack
grounded-index context --symbol Indexer --budget 4000 --include-callers
```

## Tier 2 — external CFG/ICFG tools (optional)

`grounded-index` ships two standalone scripts that produce **bytecode-precise
control-flow graphs and inter-procedural call graphs** by delegating to LLVM
(C/C++) and Soot (Java). Output is one JSON shape for all three languages.

| Script | Languages | Backend |
|--------|-----------|---------|
| `grounded_clang_cfg.py` | C, C++ | `clang -emit-llvm` + `opt -passes='dot-callgraph,dot-cfg'` + pydot |
| `grounded_java_cfg.py` | Java | `javac` + Soot 4.7.1 (`BriefBlockGraph` + CHA call graph) |

```bash
# Install the optional dependency
pip install grounded-index[external-cfg]

# C / C++ (requires clang + opt on PATH)
python grounded_clang_cfg.py src/calculator.c

# Java (requires JDK 11+; first run downloads Soot ~12 MB)
tools/download-soot.sh
python grounded_java_cfg.py src/Calculator.java --class-name Calculator
```

These tools are **not imported by the core indexer** — they're operator-facing
utilities for ICFG-aware analyses. See [MANUAL.md](MANUAL.md#external-cfgicfg-tools)
for usage and [docs/external-cfg-tools.md](docs/external-cfg-tools.md) for the
tier model and limitations.

## Documentation

| Doc | For |
|-----|-----|
| [MANUAL.md](MANUAL.md) | End-user / operator guide — full CLI reference, workflows, troubleshooting. |
| [API.md](API.md) | Programmatic Python API — `Indexer`, `QueryEngine`, parsers, schema. |
| [docs/external-cfg-tools.md](docs/external-cfg-tools.md) | Tier 2 external CFG/ICFG tools — design, dependencies, JSON schema, limitations. |
| [CHANGELOG.md](CHANGELOG.md) | Release history. |
| [vision.md](vision.md) | Product principles and positioning. |

## Status

Alpha — schema v2, seven languages, 195 tests. Tier 2 external CFG/ICFG tools
ship alongside the indexer. CLI surface is stable enough for internal use;
expect occasional additions before 1.0.

## License

GPL-3.0-only. See [pyproject.toml](pyproject.toml) for the canonical declaration.
