Metadata-Version: 2.4
Name: codetool-explore
Version: 0.6.0
Summary: Fast, dependency-free workspace search, read, and list exploration for coding-agent tools with Rust backend
Project-URL: Homepage, https://github.com/pbi-agent/codetool-explore
Project-URL: Repository, https://github.com/pbi-agent/codetool-explore
Project-URL: Issues, https://github.com/pbi-agent/codetool-explore/issues
Project-URL: Changelog, https://github.com/pbi-agent/codetool-explore/releases
Author-email: drod <naceur.bs@gmail.com>
Maintainer-email: drod <naceur.bs@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agent,code-search,developer-tools,explore,file-search,filesystem,rust,search,text-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Classifier: Typing :: Typed
Requires-Python: >=3.12
Description-Content-Type: text/markdown

# codetool-explore

`codetool-explore` is a workspace exploration library built for coding-agent harnesses: fast content search, fast filename/path discovery, read-only file viewing, compact structured results, and predictable token usage.

- **Agent-first API**: one public `explore()` call with `target="content"`, `"path"`, `"content_or_path"`, `"read"`, or `"list"`.
- **Performance-oriented**: dependency-free Python fallback plus optional Rust CLI acceleration for literal and regex content/path search.
- **Token-compressed output**: compact result keys by default for search, tree-compressed text by default for list, plain text by default for read, `result_format="text"` for raw RTK-style text, and `result_format="full"` for the uncompressed backend shape.

```python
from codetool_explore import explore

content = explore("UserService", root=".", mode="files")
paths = explore("service", root=".", target="path", glob="*.py")
mixed = explore("UserService", root=".", target="content_or_path")
scoped = explore("search_workspace", root=["src", "webapp", "tests"], regex=False)
snippet = explore("README.md", root=".", target="read", start_line=20, limit=40)
listing = explore("src", root=".", target="list", limit=100)
```

Patterns are regexes by default, so alternation works without extra flags:

```python
explore("Maximum number of results|Text or regex pattern", root="tests")
```

Pass `regex=False` for exact literal matching.

For maximum token compression, request raw text:

```python
print(explore("UserService", root=".", regex=False, result_format="text"))
```

Raw text omits backend/totals metadata, groups repeated path prefixes in a small
tree, crops long snippets/context aggressively, and prints `No Match` for empty
results. It includes a compact pagination header only when another page exists:

```text
-- more: cursor=50
src/
 a.py
```

Raw mode grammar:

- `mode="files"`: matching filenames only.
- `mode="count"`: `path xN`, where `N` is the per-file count.
- `mode="snippets"`: `path:line:text` without context, or tree-grouped files
  where `line:text` marks a match and other indented text is surrounding context.
  With `target="content_or_path"`, path-only matches are returned as filename rows.

## API

```python
explore(
    pattern,
    root=".",               # path, file, or non-empty list/tuple of paths
    target="content",       # "content", "path", "content_or_path", "read", or "list"
    regex=True,             # set False for literal search
    path_scope="path",      # "path" or "basename" for path matching
    glob=None,
    exclude=None,
    case="smart",
    mode="files",          # "files", "snippets", or "count"
    context_lines=0,
    limit=50,
    cursor=None,
    start_line=1,           # first line for target="read"
    backend="auto",        # "auto", "python", "rust"/"native"
    result_format=None,     # default compressed for search, text for read, tree text for list
)
```

`target="content"` searches file contents. `target="path"` searches relative
file paths without opening file contents. `target="content_or_path"` returns
files matching either target and marks each row with its match kind.
`mode="snippets"` supports `target="content"` and `target="content_or_path"`;
path-only rows under `target="content_or_path"` are returned without
line/snippet fields.

`target="read"` treats `pattern` as one known file path, resolves relative paths
under each root, and returns plain text with no line-number prefixes. When more
than one file is read, each file is prefixed by a compact `path:` header. Use
`start_line` and `limit` to cap the returned line range; if more lines remain,
text output starts with `-- more: cursor=N`. CSV files are read as ordinary
text. Binary-looking, missing, unreadable, or directory paths fail with
controlled `ExploreError` subclasses.

`target="list"` treats `pattern` as one file/directory path and returns one
directory level under each root. Text output uses the same compact tree display
as raw search output when that saves tokens, appending each root's listing in
root order. Directories end with `/`; file paths are returned as one entry. It
honors multi-root common-base paths, `glob`, `exclude`, ignore files, `limit`,
and `cursor`.
Read/list use the pure-Python stdlib implementation even when `backend="auto"`
or `"rust"` is requested.

`backend="auto"` uses the Rust helper when present, then falls back to pure Python. Regex searches use Rust when supported by its regex engine and fall back to Python for compatibility, including Python `re.finditer` counts for patterns that can match empty spans.

`root` accepts `str | os.PathLike | Sequence[str | os.PathLike]`. It may be a
workspace directory, a single file, or a non-empty list/tuple of directories and
files:

```python
explore("search_workspace", root=["src", "webapp", "tests"], regex=False)
```

When calling through JSON/tool schemas, pass multi-root values as a JSON array,
for example `"root": ["src", "webapp", "tests"]`. For resilience with coding
agents, a space-delimited string such as `"root": "src webapp tests"` is also
treated as multiple roots when that exact path does not exist and every split
token is an existing file or directory. Existing paths with spaces still take
priority; quote individual spaced paths if combining them in one string.

File roots search/read only that file and report paths relative to the file's
parent directory; listing a file path returns one file entry. Multi-root
searches/reads/listings report paths relative to the roots' common base, so
sibling roots keep prefixes such as `src/...` and `tests/...`; this also lets
`exclude=["src/generated/**"]` target one root.

Controlled failures raise `ExploreError` subclasses:

- `ExploreArgumentError` for invalid arguments.
- `ExplorePatternError` for invalid/unsupported patterns.
- `ExploreRootError` for missing or unsearchable roots.
- `ExploreBackendError` for backend runtime failures.

## CLI

```bash
codetool-explore "UserService" . --literal --format text
codetool-explore "service" . --target path --literal
codetool-explore "User(Service|Repository)" --root src --mode snippets --raw
codetool-explore "search_workspace" --root src --root webapp --root tests --literal
codetool-explore --read README.md --start-line 20 --limit 40
codetool-explore --read settings.py --root src --root tests
codetool-explore --list . --root src --root tests --glob "*.py"
```

The CLI defaults to compact JSON for search, plain text for `--read`, and
tree-compressed text for `--list`.
Use `--format text` or `--raw` for raw search text; no search matches print
`No Match`. Repeat `--root` for multiple search/read/list roots. A single
quoted space-delimited `--root` is accepted as a compatibility fallback when it
can be split into existing roots.

## Install

```bash
uv install codetool-explore
```

Wheels can include a platform-specific Rust helper. Without it, the package still works through the Python stdlib backend.

## Benchmarks

Reproduce and refresh the generated README data:

```bash
cargo build --release --manifest-path rust/Cargo.toml
uv run python benchmarks/benchmark_search.py \
  --output reports/search_benchmark.json \
  --update-readme
uv run python benchmarks/benchmark_output_lengths.py \
  --output reports/rtk_vs_codetool_output_lengths.json
uv run python scripts/update_readme_benchmarks.py \
  --performance reports/search_benchmark.json \
  --tokens reports/rtk_vs_codetool_output_lengths.json
```

<!-- benchmark-results:start -->

<!-- Generated by scripts/update_readme_benchmarks.py; do not edit by hand. -->

### Execution performance

Mean of median wall-clock timings across 5 corpora × 7 scenarios, 5 measured rounds after 1 warmup.

| Tool | Mean median time | Chart |
| --- | ---: | --- |
| `codetool-explore` | 127.0 ms | ███████████░░░░░░░ |
| `rg` | 138.2 ms | ████████████░░░░░░ |
| `rtk` | 199.7 ms | ██████████████████ |

`codetool-explore` is the fastest tool in this run.

Source: `reports/search_benchmark.json`.

### Token compression

Token counts use `tiktoken` when available. The table compares output across 7 RTK-corpus scenarios.

| Output | Tokens | Bytes | Chart |
| --- | ---: | ---: | --- |
| `codetool-explore` | 11,008 | 34.3 KB | ██░░░░░░░░░░░░░░░░ |
| `rtk grep` stdout | 19,646 | 60.1 KB | ███░░░░░░░░░░░░░░░ |
| `rg` stdout | 129,775 | 402.4 KB | ██████████████████ |

`codetool-explore` is raw text from `explore(..., result_format="text")`; it omits backend/totals metadata, includes only a cursor hint when truncated, and prints `No Match` for empty pages. It is 0.56× the `rtk grep` token count in this run.

Source: `reports/rtk_vs_codetool_output_lengths.json`.

<!-- benchmark-results:end -->

## Development

```bash
uv run pytest
uv run python scripts/package_rust_binary.py
uv build --wheel
```

Release wheels are built in CI with the staged Rust helper for each target platform.
