Metadata-Version: 2.4
Name: code-context-py
Version: 1.0.3
Summary: Language-agnostic code context extraction from Git diffs for LLM agents.
Author: Mlinaresweb
License: MIT
Project-URL: Homepage, https://github.com/mlinaresweb/code-context-py
Project-URL: Repository, https://github.com/mlinaresweb/code-context-py
Project-URL: Issues, https://github.com/mlinaresweb/code-context-py/issues
Keywords: diff,code-context,llm,github,documentation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Documentation
Classifier: Topic :: Software Development :: Version Control :: Git
Classifier: Topic :: Text Processing :: Markup
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# code-context-py

`code-context-py` turns Git and GitHub diffs into structured, LLM-ready code context.

It is designed for documentation agents, code review agents, changelog generators, release note tools and internal engineering assistants that need to understand what changed without dumping an entire repository into a prompt.

## Features

- Parses unified diffs and GitHub patch responses.
- Loads changed files at the target commit/ref.
- Includes exact file patch excerpts, including removed and added lines.
- Extracts full enclosing blocks around changed lines.
- Finds related same-file blocks through changed identifiers.
- Follows local dependencies from imports, includes, templates and aliases.
- Resolves common project aliases from `tsconfig.json`, `jsconfig.json` and Composer `psr-4`.
- Searches repository-wide references when the provider can list files.
- Detects symbols across language families with a generic adapter system.
- Includes a WordPress-aware adapter for hooks, filters, CPTs, taxonomies, shortcodes, enqueued assets and template parts.
- Produces a full debug JSON report when `debug=True`.
- Includes the exact compact LLM pack inside the debug JSON.
- Provides a configurable `to_llm_pack(...)` output that prioritizes the most important context.
- Works without native parser dependencies, while allowing custom adapters for deeper language-specific analysis.

## Installation

```bash
pip install code-context-py
```

For development from source:

```bash
git clone https://github.com/mlinaresweb/code-context-py.git
cd code-context-py
pip install -e .
```

## Quick Start

```python
from code_context_py import CallbackFileProvider, ExtractionOptions, build_context_from_diff

def load_file_at_ref(path: str) -> str | None:
    ...

provider = CallbackFileProvider(load_file_at_ref)

report = build_context_from_diff(
    diff_text,
    provider,
    options=ExtractionOptions(debug=True),
)

full_context = report.to_prompt()
llm_pack = report.to_llm_pack(max_chars=70000, max_block_chars=14000)
```

With `debug=True`, the library writes:

```text
code-context-debug/{diff_hash}.json
```

The debug JSON contains the full structured report and the compact LLM pack generated from that report.

## Local Git Provider

```python
from code_context_py import LocalGitFileProvider, build_context_from_diff

provider = LocalGitFileProvider("/path/to/repo", "commit-sha")
report = build_context_from_diff(diff_text, provider)
pack = report.to_llm_pack(max_chars=70000)
```

`LocalGitFileProvider` can read files from a commit and list repository files, enabling dependency traversal and repository-wide reference search.

## GitHub Commit

```python
from code_context_py import build_context_from_github_commit

pack = build_context_from_github_commit(
    "owner/repo",
    "commit-sha",
    token="github-token",
    max_chars=70000,
)
```

## Private GitHub Repositories

Private repositories work through the GitHub API as long as the token can read the repository contents.

You can pass the token explicitly:

```python
from code_context_py import build_context_from_github_commit

pack = build_context_from_github_commit(
    "owner/private-repo",
    "commit-sha",
    token="github-token-with-repo-read-access",
    max_chars=70000,
)
```

Or set one of these environment variables:

```bash
export GITHUB_TOKEN="github-token-with-repo-read-access"
# or
export GH_TOKEN="github-token-with-repo-read-access"
```

Then call the API without passing `token`:

```python
pack = build_context_from_github_commit("owner/private-repo", "commit-sha", max_chars=70000)
```

The token needs permission to read repository metadata and contents. For classic personal access tokens, use `repo` for private repositories. For fine-grained tokens, grant read access to `Contents` and `Metadata` for the target repositories.

You can check access first:

```python
from code_context_py import can_access_github_repository

if not can_access_github_repository("owner/private-repo", token="github-token"):
    raise RuntimeError("Token cannot read this repository")
```

## GitHub Compare

```python
from code_context_py import build_context_from_github_compare

pack = build_context_from_github_compare(
    "owner/repo",
    "base-ref",
    "head-ref",
    token="github-token",
    max_chars=70000,
)
```

## Debug Mode

Debug mode is enabled from code, not from environment variables:

```python
ExtractionOptions(debug=True)
ExtractionOptions(debug=True, debug_dir="debug/code-context")
ExtractionOptions(debug=True, debug_path="debug/context.json")
```

The debug report includes:

- changed files
- exact patch excerpts
- symbols
- enclosing blocks
- related blocks
- dependency blocks
- repository reference blocks
- graph edges
- warnings
- the compact LLM pack
- the LLM pack length

## LLM Pack

`to_prompt()` returns the full extracted report. Use it for inspection or for very large context windows.

`to_llm_pack(...)` returns a prioritized, compact prompt pack for LLMs:

```python
pack = report.to_llm_pack(
    max_chars=70000,
    max_block_chars=14000,
)
```

Priority order:

1. Exact file patches.
2. Changed enclosing blocks.
3. Dependency blocks.
4. Same-file related blocks.
5. Repository-wide references.
6. File preludes.

The pack includes a manifest and an omitted-section list when the budget is too small. The full context remains available in the debug JSON.

## WordPress Support

The default adapter registry includes WordPress-aware extraction for PHP themes/plugins and mixed WordPress codebases.

It detects and relates context around:

- `add_action`
- `add_filter`
- `register_post_type`
- `register_taxonomy`
- `add_shortcode`
- `wp_enqueue_script`
- `wp_enqueue_style`
- `get_template_part`
- `locate_template`
- PHP `require` / `include`
- theme templates
- template parts
- JS/CSS/SCSS assets referenced by changed code

This is still dependency-free. For highly specialized projects, you can register your own adapter.

## Custom Adapters

```python
from code_context_py import AdapterRegistry, build_context_from_diff

registry = AdapterRegistry()
registry.register(MyLanguageAdapter())

report = build_context_from_diff(diff_text, provider, adapters=registry)
```

Adapters can customize:

- file support detection
- enclosing range detection
- related range detection
- symbol extraction
- import/template/dependency extraction
- tokenization

## CLI

```bash
code-context-py --diff change.patch --repo-path /path/to/repo --ref commit-sha
code-context-py --diff github --github-repo owner/repo --github-ref commit-sha --github-token "$GITHUB_TOKEN"
code-context-py --diff github-compare --github-repo owner/repo --github-base main --github-head feature --github-token "$GITHUB_TOKEN"
```

Useful options:

```bash
--max-chars 70000
--max-block-chars 14000
--max-changed-files 50
--max-blocks-per-file 8
--dependency-depth 2
--debug
--debug-dir code-context-debug
--debug-path debug/context.json
--llm-pack-max-chars 70000
--llm-pack-max-block-chars 14000
```

## Publishing

```bash
python -m pip install --upgrade build twine
python -m build
python -m twine upload dist/*
```

## Design Philosophy

No generic tool can provide perfect semantic call graphs for every programming language and framework without language-specific parsers. `code-context-py` is built to be robust in real mixed repositories by combining:

- exact diff data
- full changed blocks
- dependency traversal
- repository-wide reference search
- framework-aware adapters
- compact prompt packing
- full debug inspection

This makes it useful immediately across many languages while keeping a clean path for deeper adapters where a project needs more precision.
