lmcat
lmcat
A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.
Features
- Tree view of directory structure with file statistics (lines, characters, tokens)
- Includes file contents with clear delimiters
- Respects
.gitignorepatterns (can be disabled) - Supports custom ignore patterns via
.lmignore - Configurable via
pyproject.toml,lmcat.toml, orlmcat.json- you can specify
glob_processordecider_processto run on files, like if you want to convert a notebook to a markdown file
- you can specify
Installation
Install from PyPI:
pip install lmcat
or, install with support for counting tokens:
pip install lmcat[tokenizers]
Usage
Basic usage - concatenate current directory:
# Only show directory tree
python -m lmcat --tree-only
# Write output to file
python -m lmcat --output summary.md
# Print current configuration
python -m lmcat --print-cfg
The output will include a directory tree and the contents of each non-ignored file.
Command Line Options
-t,--tree-only: Only print the directory tree, not file contents-o,--output: Specify an output file (defaults to stdout)-h,--help: Show help message
Configuration
lmcat is best configured via a tool.lmcat section in pyproject.toml:
[tool.lmcat]
# Tree formatting
tree_divider = "│ " # Vertical lines in tree
tree_indent = " " # Indentation
tree_file_divider = "├── " # File/directory entries
content_divider = "``````" # File content delimiters
# Processing pipeline
tokenizer = "gpt2" # or "whitespace-split"
tree_only = false # Only show tree structure
on_multiple_processors = "except" # Behavior when multiple processors match
# File handling
ignore_patterns = ["*.tmp", "*.log"] # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]
# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"
Development
Setup
- Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
- Set up the development environment:
make setup
Development Commands
The project uses make for common development tasks:
make dep: Install/update dependenciesmake format: Format code using ruff and pyclnmake test: Run testsmake typing: Run type checksmake check: Run all checks (format, test, typing)make clean: Clean temporary filesmake docs: Generate documentationmake build: Build the packagemake publish: Publish to PyPI (maintainers only)
Run make help to see all available commands.
Running Tests
make test
For verbose output:
VERBOSE=1 make test
Roadmap
- more processors and deciders, like:
- only first
nlines if file is too large - first few lines of a csv file
- json schema of a big json/toml/yaml file
- metadata extraction from images
- only first
- better tests, I feel like gitignore/lmignore interaction is broken
- llm summarization and caching of those summaries in
.lmsummary/ - reasonable defaults for file extensions to ignore
- web interface
1""" 2.. include:: ../README.md 3""" 4 5from lmcat.lmcat import main 6 7__all__ = [ 8 # funcs 9 "main", 10 # submodules 11 "lmcat", 12 "file_stats", 13 "processing_pipeline", 14 "processors", 15]
def
main() -> None:
394def main() -> None: 395 """Main entry point for the script""" 396 arg_parser = argparse.ArgumentParser( 397 description="lmcat - list tree and content, combining .gitignore + .lmignore", 398 add_help=False, 399 ) 400 arg_parser.add_argument( 401 "-t", 402 "--tree-only", 403 action="store_true", 404 default=False, 405 help="Only print the tree, not the file contents.", 406 ) 407 arg_parser.add_argument( 408 "-o", 409 "--output", 410 action="store", 411 default=None, 412 help="Output file to write the tree and contents to.", 413 ) 414 arg_parser.add_argument( 415 "-h", "--help", action="help", help="Show this help message and exit." 416 ) 417 arg_parser.add_argument( 418 "--print-cfg", 419 action="store_true", 420 default=False, 421 help="Print the configuration as json and exit.", 422 ) 423 arg_parser.add_argument( 424 "--allow-plugins", 425 action="store_true", 426 default=False, 427 help="Allow plugins to be loaded from the plugins file. WARNING: this will execute arbitrary code found in the file pointed to by `config.plugins_file`, and **is a security risk**.", 428 ) 429 430 args: argparse.Namespace = arg_parser.parse_known_args()[0] 431 root_dir: Path = Path(".").resolve() 432 config: LMCatConfig = LMCatConfig.read(root_dir) 433 434 # CLI overrides 435 config.tree_only = args.tree_only 436 config.allow_plugins = args.allow_plugins 437 438 # print cfg and exit if requested 439 if args.print_cfg: 440 print(json.dumps(config.serialize(), indent="\t")) 441 return 442 443 # assemble summary 444 summary: str = assemble_summary(root_dir=root_dir, config=config) 445 446 # Write output 447 if args.output: 448 output_path: Path = Path(args.output) 449 output_path.parent.mkdir(parents=True, exist_ok=True) 450 output_path.write_text(summary, encoding="utf-8") 451 else: 452 if sys.platform == "win32": 453 sys.stdout = io.TextIOWrapper( 454 sys.stdout.buffer, encoding="utf-8", errors="replace" 455 ) 456 sys.stderr = io.TextIOWrapper( 457 sys.stderr.buffer, encoding="utf-8", errors="replace" 458 ) 459 460 print(summary)
Main entry point for the script