Metadata-Version: 2.4
Name: repo-context-lib
Version: 0.2.1
Summary: Extract repository contents into formatted text for LLM context
Project-URL: Homepage, https://github.com/MabudAlam/repo-context
Project-URL: Documentation, https://github.com/MabudAlam/repo-context#readme
Author: repocontext
License: MIT
Keywords: context,github,llm,repository,tokenizer
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: httpx>=0.25.0
Requires-Dist: tiktoken>=0.5.0
Provides-Extra: cli
Requires-Dist: fastapi>=0.136.1; extra == 'cli'
Requires-Dist: uvicorn>=0.47.0; extra == 'cli'
Provides-Extra: dev
Requires-Dist: black>=23.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# repocontext

> Extract repository contents into formatted text for LLM context.

A Python library that fetches GitHub repositories, builds hierarchical file trees, and generates formatted text output with directory structure and file contents including token counts for LLM context limits.

## Features

- **GitHub Support**: Fetch public and private repositories
- **Async Operations**: Efficient file fetching with concurrency control
- **Token Counting**: Accurate GPT token counting via tiktoken
- **Extensible**: Easy to add new providers (GitLab, Azure DevOps, etc.)
- **Structured Output**: Directory trees with file contents in markdown

## Installation

```bash
pip install repocontext
```

## Quick Start

```python
from repocontext import fetch

# Simple usage - synchronous
result = fetch("https://github.com/owner/repo")
print(result.markdown)
print(f"Tokens: {result.token_count}")

# With file contents
result = fetch("https://github.com/owner/repo", token="ghp_xxx", fetch_content=True)
print(result.markdown)
```

## Usage Examples

### Using the Provider Directly

```python
import asyncio
from repocontext import GitHubProvider, Formatter, build_tree

async def main():
    provider = GitHubProvider()
    
    # Fetch repository
    result = await provider.get_repository(
        "https://github.com/owner/repo",
        token=None,
        fetch_content=True
    )
    
    print(result.directory_tree)
    print(f"Found {result.file_count} files")
    print(f"Total tokens: {result.token_count}")

asyncio.run(main())
```

### Filtering Files

```python
import asyncio
from repocontext import GitHubProvider

async def main():
    provider = GitHubProvider()
    nodes = await provider.fetch_tree("https://github.com/owner/repo")
    
    # Get only Python files
    py_files = [n for n in nodes if n.is_file() and n.get_extension() == ".py"]
    
    # Get files from specific directory
    src_files = [n for n in nodes if n.path.startswith("src/")]
    
    # Get files larger than 1KB
    large_files = [n for n in nodes if n.is_file() and n.size and n.size > 1024]

asyncio.run(main())
```

### Building Trees and Formatting

```python
from repocontext import build_tree, Formatter, FileNode, TreeNode

# Build tree from flat nodes
tree = build_tree(
    nodes,
    selected_paths={n.path for n in selected_files},
    excluded_paths=set(),
    expanded_paths={n.path for n in nodes if n.is_directory()},
)

# Format as markdown
markdown = Formatter.format_markdown(tree, contents)
```

## API Reference

### Main Function

```python
from repocontext import fetch

result = fetch(
    url="https://github.com/owner/repo",  # Required
    token=None,        # Optional GitHub token for private repos
    fetch_content=False # Set True to include file contents
)
```

Returns `RepositoryResult` with:
- `url` - The repository URL
- `branch` - The resolved branch name
- `files` - List of FileNode objects
- `directories` - List of directory paths
- `contents` - List of FileContent objects (when fetch_content=True)
- `markdown` - Full markdown output
- `directory_tree` - ASCII tree representation
- `token_count` - Total token count
- `line_count` - Total line count
- `file_count` - Number of files
- `stats` - Statistics dictionary

### Providers

#### GitHubProvider

```python
from repocontext import GitHubProvider

provider = GitHubProvider()

# Set credentials (optional for public repos)
provider.set_credentials("ghp_your_token_here")

# Get full repository result
result = await provider.get_repository(
    url="https://github.com/owner/repo",
    token=None,
    fetch_content=False,
    branch=None  # Optional branch override
)

# Fetch tree only
nodes = await provider.fetch_tree(url, branch="main", path="src")

# Fetch multiple files with concurrency
async for content in provider.fetch_multiple(file_nodes):
    print(content.path, len(content.text))
```

### Types

#### FileNode

```python
from repocontext import FileNode

node = FileNode(path="src/main.py", type="blob", size=1024, sha="abc123")

node.is_file()        # True
node.is_directory()   # False
node.get_name()       # "main.py"
node.get_extension()  # ".py"
```

#### TreeNode

```python
from repocontext import TreeNode

node = TreeNode(
    name="src",
    path="src",
    type="directory",
    children=[...],
    selected=True
)

node.is_file()        # False
node.is_directory()   # True
```

#### FileContent

```python
from repocontext import FileContent

content = FileContent(
    path="src/main.py",
    text="print('hello')",
    url="https://...",
    line_count=1,
    token_count=3
)
```

### Formatter

```python
from repocontext import Formatter

# Count tokens
tokens = Formatter.count_tokens("hello world")

# Format project tree
tree_str = Formatter.format_project_tree(tree_nodes)

# Format as markdown
markdown = Formatter.format_markdown(tree, contents)
```

### Tree Building

```python
from repocontext import build_tree, extract_directories

# Build hierarchical tree from flat nodes
tree = build_tree(
    nodes,
    selected_paths=set_of_selected_paths,
    excluded_paths=set_of_excluded_paths,
    expanded_paths=set_of_expanded_directories
)

# Extract all directory paths
dirs = extract_directories(nodes)
```

## Exception Handling

```python
from repocontext import (
    InvalidURLError,
    AuthenticationError,
    NotFoundError,
    RateLimitError,
    NetworkError,
)

try:
    result = fetch("https://github.com/owner/repo")
except InvalidURLError as e:
    print(f"Invalid URL: {e.user_message}")
except AuthenticationError as e:
    print(f"Auth failed: {e.user_message}")
except RateLimitError as e:
    print(f"Rate limited: {e.user_message}")
except NetworkError as e:
    print(f"Network error: {e.user_message}")
```

## Extending the Package

### Adding a New Provider

Create a new provider by extending `BaseProvider`:

```python
from repocontext.providers import BaseProvider, register_provider

@register_provider("gitlab")
class GitLabProvider(BaseProvider):
    API_BASE = "https://gitlab.com/api/v4"

    @property
    def get_type(self) -> str:
        return "gitlab"

    @property
    def get_name(self) -> str:
        return "GitLab"

    def requires_auth(self) -> bool:
        return True

    def validate_url(self, url: str) -> bool:
        return url.startswith("https://gitlab.com/")

    def parse_url(self, url: str) -> ParsedRepoInfo:
        # Parse the URL and return ParsedRepoInfo
        ...

    async def _fetch_tree(self, url: str, **options) -> list[FileNode]:
        # Fetch the repository tree
        ...

    async def _fetch_file_content(self, node: FileNode) -> FileContent:
        # Fetch a single file's content
        ...
```

## License

MIT