Metadata-Version: 2.4
Name: llm-fragments-folder
Version: 0.3.0
Summary: LLM plugin to load entire folder contents as fragments
Project-URL: Homepage, https://github.com/michael-borck/llm-fragments-folder
Project-URL: Issues, https://github.com/michael-borck/llm-fragments-folder/issues
Author: Michael Borck
License: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing
Requires-Python: >=3.10
Requires-Dist: llm>=0.24
Requires-Dist: pathspec
Description-Content-Type: text/markdown

# llm-fragments-folder

An [LLM](https://llm.datasette.io/) plugin that loads entire folder contents as fragments, turning any directory into a chat-ready knowledge base.

## Installation

```bash
llm install llm-fragments-folder
```

Or install from source:

```bash
cd llm-fragments-folder
pip install -e .
```

## Usage

Two fragment loaders are provided: `folder:` for general document collections and `project:` for software projects.

### folder: - Load documents from a directory

```bash
# Chat against all docs in a folder
llm chat -f folder:./docs

# Ask a question about files in the current directory
llm -f folder:. "What are these documents about?"

# Combine with a specific model
llm -f folder:~/notes -m claude-sonnet-4-5 "Find all action items"

# Use with system fragments for custom instructions
llm -f folder:./research --sf "You are a research assistant" "Summarize the key findings"

# Only load specific file types
llm -f "folder:./docs?ext=md,txt" "Summarize the docs"
llm -f "folder:.?ext=json,yaml" "Explain these configs"
```

### project: - Load a software project (respects .gitignore)

```bash
# Explain a codebase
llm chat -f project:.

# Ask about a specific project
llm -f project:./my-app "What framework does this use?"

# Code review
llm -f project:. "Review this code for security issues"

# Architecture overview
llm -f project:~/repos/my-api -m claude-sonnet-4-5 "Describe the architecture"

# Only Python files
llm -f "project:.?ext=py" "Review this code"
```

The `project:` loader:

- Uses `git ls-files` when inside a git repo (most accurate)
- Falls back to parsing `.gitignore` patterns if git is not available
- Prepends a file tree summary as the first fragment
- Automatically skips `node_modules`, `__pycache__`, `.git`, `venv`, `dist`, `build`, etc.

### Combining with other fragments

Fragments compose naturally with each other and with LLM's other features:

```bash
# Folder + URL context
llm -f folder:./docs -f https://example.com/api-spec "Compare our docs to the spec"

# Folder + system prompt
llm -f folder:./meeting-notes --system "Extract action items with owners and dates" ""

# Project + GitHub issue
llm install llm-fragments-github
llm -f project:. -f issue:user/repo/42 "Implement this feature"
```

## What gets loaded

**Text file detection** is based on file extension and filename. Supported types include:

- Documents: `.md`, `.qmd`, `.txt`, `.rst`, `.adoc`, `.tex`, `.org`
- Code: `.py`, `.js`, `.ts`, `.go`, `.rs`, `.java`, `.rb`, `.c`, `.cpp`, and many more
- Config: `.json`, `.yaml`, `.yml`, `.toml`, `.ini`, `.env`, `.cfg`
- Web: `.html`, `.css`, `.scss`, `.svg`, `.xml`
- Data: `.csv`, `.tsv`, `.sql`, `.graphql`
- Dotfiles: `.bashrc`, `.zshrc`, `.vimrc`, `.gitconfig`, `.tmux.conf`, `.profile`, `.npmrc`, etc.
- Special files: `Makefile`, `Dockerfile`, `LICENSE`, etc.
- Shebang scripts: extensionless files starting with `#!`

**Always skipped directories**: `.git`, `node_modules`, `__pycache__`, `.venv`, `venv`, `dist`, `build`, `.idea`, `.vscode`, `.mypy_cache`, `.pytest_cache`, etc.

### Filtering by extension

Use `?ext=` to control which file types are loaded. Extensions can be specified with or without the leading dot (`md` and `.md` both work). Multiple extensions are comma-separated.

#### Include only (default)

```bash
llm -f "folder:./src?ext=py,js,ts" "Review this code"
llm -f "project:.?ext=md,txt" "Summarize the documentation"
```

#### Exclude with `!`

Prefix extensions with `!` to exclude them (everything else is included):

```bash
# Everything except markdown
llm -f "folder:.?ext=!md" "Review the non-docs files"

# Everything except markdown and text
llm -f "project:.?ext=!md,!txt" "Focus on the code"
```

#### Force-include with `+`

Use `+` to force-include custom or non-standard extensions:

```bash
# Exclude markdown, but include a custom extension
llm -f "folder:.?ext=!md,+custom" "Review these files"

# Include only Python and a bespoke file type
llm -f "folder:.?ext=py,+myformat" "Analyze these"
```

#### Dotfiles

Use `dotfiles` to grab all dotfiles (`.bashrc`, `.gitconfig`, `.vimrc`, etc.):

```bash
# Load all dotfiles
llm -f "folder:~?ext=dotfiles" "Explain my shell config"

# Combine dotfiles with other extensions
llm -f "folder:~?ext=dotfiles,md" "Summarize my config and docs"

# Target a specific dotfile by name
llm -f "folder:~?ext=.bashrc,.zshrc" "Compare these shell configs"

# Exclude markdown but include all dotfiles
llm -f "folder:.?ext=!md,dotfiles" "Review configs and code"
```

**Binary file detection**: Files containing null bytes are automatically detected as binary and skipped, even if force-included via `+`. This prevents garbled output from PDFs, images, Word docs, etc.

**Safety limits**: Files larger than 1MB are skipped. Maximum 500 files per loader call.

## How it works

Each file becomes a separate LLM fragment, wrapped with a filename header:

```
--- path/to/file.py ---
<file contents>
```

This means LLM's fragment deduplication works at the file level. If you reference the same folder across multiple prompts, files that haven't changed won't be stored again in the log database.

## Development

```bash
# Clone and install for development
git clone https://github.com/michael-borck/llm-fragments-folder.git
cd llm-fragments-folder
uv sync

# Run tests
uv run pytest

# Lint and format
uv run ruff check .
uv run ruff format .

# Type checking
uv run mypy llm_fragments_folder.py
```

## Acknowledgments

- [Simon Willison](https://simonwillison.net/) for [LLM](https://llm.datasette.io/) and the excellent fragment plugin API
- Inspired by [files-to-prompt](https://github.com/simonw/files-to-prompt) and [llm-fragments-github](https://github.com/simonw/llm-fragments-github)

## License

Apache 2.0
