docs for lmcat v0.1.1
View Source on GitHub

lmcat

lmcat

A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.

Features

  • Tree view of directory structure with file statistics (lines, characters, tokens)
  • Includes file contents with clear delimiters
  • Respects .gitignore patterns (can be disabled)
  • Supports custom ignore patterns via .lmignore
  • Configurable via pyproject.toml, lmcat.toml, or lmcat.json
    • you can specify glob_process or decider_process to run on files, like if you want to convert a notebook to a markdown file

Installation

Install from PyPI:

pip install lmcat

or, install with support for counting tokens:

pip install lmcat[tokenizers]

Usage

Basic usage - concatenate current directory:

# Only show directory tree
python -m lmcat --tree-only

# Write output to file
python -m lmcat --output summary.md

# Print current configuration
python -m lmcat --print-cfg

The output will include a directory tree and the contents of each non-ignored file.

Command Line Options

  • -t, --tree-only: Only print the directory tree, not file contents
  • -o, --output: Specify an output file (defaults to stdout)
  • -h, --help: Show help message

Configuration

lmcat is best configured via a tool.lmcat section in pyproject.toml:

[tool.lmcat]
# Tree formatting
tree_divider = "│   "    # Vertical lines in tree
tree_indent = " "        # Indentation
tree_file_divider = "├── "  # File/directory entries
content_divider = "``````"  # File content delimiters

# Processing pipeline
tokenizer = "gpt2"  # or "whitespace-split"
tree_only = false   # Only show tree structure
on_multiple_processors = "except"  # Behavior when multiple processors match

# File handling
ignore_patterns = ["*.tmp", "*.log"]  # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]

# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"

Development

Setup

  1. Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
  1. Set up the development environment:
make setup

Development Commands

The project uses make for common development tasks:

  • make dep: Install/update dependencies
  • make format: Format code using ruff and pycln
  • make test: Run tests
  • make typing: Run type checks
  • make check: Run all checks (format, test, typing)
  • make clean: Clean temporary files
  • make docs: Generate documentation
  • make build: Build the package
  • make publish: Publish to PyPI (maintainers only)

Run make help to see all available commands.

Running Tests

make test

For verbose output:

VERBOSE=1 make test

Roadmap

  • more processors and deciders, like:
    • only first n lines if file is too large
    • first few lines of a csv file
    • json schema of a big json/toml/yaml file
    • metadata extraction from images
  • better tests, I feel like gitignore/lmignore interaction is broken
  • llm summarization and caching of those summaries in .lmsummary/
  • reasonable defaults for file extensions to ignore
  • web interface

 1"""
 2.. include:: ../README.md
 3"""
 4
 5from lmcat.lmcat import main
 6
 7__all__ = [
 8	# funcs
 9	"main",
10	# submodules
11	"lmcat",
12	"file_stats",
13	"processing_pipeline",
14	"processors",
15]

def main() -> None:
394def main() -> None:
395	"""Main entry point for the script"""
396	arg_parser = argparse.ArgumentParser(
397		description="lmcat - list tree and content, combining .gitignore + .lmignore",
398		add_help=False,
399	)
400	arg_parser.add_argument(
401		"-t",
402		"--tree-only",
403		action="store_true",
404		default=False,
405		help="Only print the tree, not the file contents.",
406	)
407	arg_parser.add_argument(
408		"-o",
409		"--output",
410		action="store",
411		default=None,
412		help="Output file to write the tree and contents to.",
413	)
414	arg_parser.add_argument(
415		"-h", "--help", action="help", help="Show this help message and exit."
416	)
417	arg_parser.add_argument(
418		"--print-cfg",
419		action="store_true",
420		default=False,
421		help="Print the configuration as json and exit.",
422	)
423	arg_parser.add_argument(
424		"--allow-plugins",
425		action="store_true",
426		default=False,
427		help="Allow plugins to be loaded from the plugins file. WARNING: this will execute arbitrary code found in the file pointed to by `config.plugins_file`, and **is a security risk**.",
428	)
429
430	args: argparse.Namespace = arg_parser.parse_known_args()[0]
431	root_dir: Path = Path(".").resolve()
432	config: LMCatConfig = LMCatConfig.read(root_dir)
433
434	# CLI overrides
435	config.tree_only = args.tree_only
436	config.allow_plugins = args.allow_plugins
437
438	# print cfg and exit if requested
439	if args.print_cfg:
440		print(json.dumps(config.serialize(), indent="\t"))
441		return
442
443	# assemble summary
444	summary: str = assemble_summary(root_dir=root_dir, config=config)
445
446	# Write output
447	if args.output:
448		output_path: Path = Path(args.output)
449		output_path.parent.mkdir(parents=True, exist_ok=True)
450		output_path.write_text(summary, encoding="utf-8")
451	else:
452		if sys.platform == "win32":
453			sys.stdout = io.TextIOWrapper(
454				sys.stdout.buffer, encoding="utf-8", errors="replace"
455			)
456			sys.stderr = io.TextIOWrapper(
457				sys.stderr.buffer, encoding="utf-8", errors="replace"
458			)
459
460		print(summary)

Main entry point for the script