Metadata-Version: 2.4
Name: foldermix
Version: 0.1.26
Summary: Pack a folder into a single LLM-friendly context file
License: MIT License
        
        Copyright (c) 2026 Shay Palachy-Affek
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: cli,context,llm,packing
Requires-Python: >=3.10
Requires-Dist: click>=8
Requires-Dist: pathspec>=0.12
Requires-Dist: rich>=13.0
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: typer>=0.9
Provides-Extra: all
Requires-Dist: openpyxl>=3.1; extra == 'all'
Requires-Dist: pypdf>=3.0; extra == 'all'
Requires-Dist: pypdfium2>=4.30; extra == 'all'
Requires-Dist: python-docx>=1.0; extra == 'all'
Requires-Dist: python-pptx>=0.6; extra == 'all'
Requires-Dist: rapidocr-onnxruntime>=1.3; extra == 'all'
Requires-Dist: tqdm>=4.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: hypothesis>=6.0; extra == 'dev'
Requires-Dist: pre-commit>=4.0; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6; extra == 'docs'
Provides-Extra: markitdown
Requires-Dist: markitdown>=0.0.1; extra == 'markitdown'
Provides-Extra: mutation
Requires-Dist: mutmut>=2.4; extra == 'mutation'
Provides-Extra: ocr
Requires-Dist: pypdf>=3.0; extra == 'ocr'
Requires-Dist: pypdfium2>=4.30; extra == 'ocr'
Requires-Dist: rapidocr-onnxruntime>=1.3; extra == 'ocr'
Provides-Extra: office
Requires-Dist: openpyxl>=3.1; extra == 'office'
Requires-Dist: python-docx>=1.0; extra == 'office'
Requires-Dist: python-pptx>=0.6; extra == 'office'
Provides-Extra: pdf
Requires-Dist: pypdf>=3.0; extra == 'pdf'
Description-Content-Type: text/markdown

# foldermix

`foldermix` packs a local folder into one LLM-friendly context artifact you can inspect, share, or pipe into automation.

[![CI](https://github.com/foldermix/foldermix/actions/workflows/ci.yml/badge.svg)](https://github.com/foldermix/foldermix/actions/workflows/ci.yml)

Docs site: [foldermix.github.io/foldermix](https://foldermix.github.io/foldermix/)

```bash
pip install foldermix && foldermix pack . --out context.md
```

That command writes a Markdown bundle for the current folder:

````markdown
# FolderPack Context

- **Root**: `/path/to/project`
- **Files**: 2
- **Total bytes**: 32

## Table of Contents

- [README.md](#readmemd)
- [src/app.py](#srcapppy)

---

## src/app.py {#srcapppy}

```python
print("hello")
```
````

Before it writes output, `foldermix` keeps the run predictable:

- skips sensitive files such as `.env`, private keys, and certificates unconditionally
- respects `.gitignore` by default
- skips hidden files and directories unless you opt in with `--hidden`
- orders files deterministically
- lets you inspect included and skipped files with `list`, `skiplist`, `preview`, `stats`, and `--report`
- supports redaction, size limits, duplicate suppression, and policy dry-runs when a workflow needs stricter controls

## Install

Recommended default:

```bash
pip install foldermix
```

Use this for the core CLI and text-like files: plain text, Markdown, source code, config/data files, WebVTT, and notebooks.

Choose extras only when your folders include documents that need optional converters:

| Need | Install | Notes |
|---|---|---|
| Core CLI for text/code/config files | `pip install foldermix` | Recommended default path. |
| Isolated global CLI with common document converters | `uv tool install "foldermix[all]"` | Adds `pdf`, `ocr`, `office`, and `tqdm`; excludes `markitdown`. |
| Full optional converter stack | `uv tool install "foldermix[all,markitdown]"` | Best fit when you want every optional converter available. |
| Existing `pipx` workflow | `pipx install "foldermix[all,markitdown]"` | Same package extras, managed through `pipx`. |
| Project-specific environment | `pip install "foldermix[all,markitdown]"` in a virtualenv | Use when the CLI should live with a project environment. |
| macOS/Linux system install | `brew tap foldermix/foldermix && brew install foldermix` | Core feature set only; Homebrew does not install Python extras. |

Extras:

- `pdf`: PDF text extraction with `pypdf`
- `ocr`: OCR fallback for textless PDF pages via `--pdf-ocr`, plus explicitly included `.png`, `.jpg`, and `.jpeg` files via `--image-ocr`
- `office`: `.docx`, `.xlsx`, `.pptx`, and `.ppsx` fallback converters
- `markitdown`: additional optional converter support
- `all`: `pdf`, `ocr`, `office`, and `tqdm`; include `markitdown` separately when needed

If `foldermix` is already installed through Homebrew and you need extras, switch to a Python tool install:

```bash
brew uninstall foldermix
uv tool install "foldermix[all,markitdown]"
```

## Quick Start

Run from the folder you want to pack:

```bash
foldermix pack . --out context.md
```

Inspect before packing:

```bash
foldermix list .
foldermix skiplist .
foldermix stats .
foldermix preview . README.md
```

Use a checked-in or local config when the workflow should be repeatable:

```bash
foldermix init --profile engineering-docs
foldermix pack . --config foldermix.toml --format md --out context.md --report report.json
```

Defaults worth knowing:

- Markdown (`md`) is the default output format.
- If you omit `--out`, `foldermix` writes a timestamped file whose extension matches `--format`, such as `foldermix_20260307_120000.md`.
- `foldermix.toml` values override built-in defaults; explicit CLI flags override config file values.

## Format Guidance

Choose the output format based on where the bundle goes next:

| Format | Choose it when | Output shape |
|---|---|---|
| Markdown (`md`) | You want a readable context file to paste into chat, inspect in an editor, or share with a human reviewer. | One document with metadata, table of contents, and fenced file blocks. |
| XML (`xml`) | You want explicit file boundaries for tools or prompts that parse tagged sections well. | One `<foldermix>` document with `<header>` metadata and `<files>` containing one `<file>` element per included file. |
| JSONL (`jsonl`) | You want streaming, indexing, or pipeline-friendly machine input. | One header object followed by one JSON object per file. |

Examples:

```bash
foldermix pack . --out context.md
foldermix pack . --format xml --out context.xml
foldermix pack . --format jsonl --out context.jsonl --report report.json
```

## Common Workflows

Use these as compact starting points. The longer docs-site guides cover [workflows](site-docs/workflows.md), [configuration](site-docs/configuration.md), [output formats and reports](site-docs/output-formats-and-reports.md), and [safety and troubleshooting](site-docs/safety-and-troubleshooting.md).

Config-first project bundle:

```bash
foldermix init --profile engineering-docs
foldermix pack . --config foldermix.toml --out context.md --report report.json
```

Legal or privacy-sensitive review:

```bash
foldermix init --profile legal
foldermix pack ./matter --config foldermix.toml --format md --out legal-context.md --report legal-report.json
```

Research corpus or batch input:

```bash
find ./corpus -type f -print0 | foldermix pack ./corpus --stdin --null --format jsonl --out research-context.jsonl --report research-report.json
```

Support incident bundle:

```bash
foldermix init --profile support
foldermix pack . --config foldermix.toml --format md --out support-context.md --report support-report.json
```

Course refresh bundle:

```bash
foldermix init --profile course-refresh
foldermix pack ./previous-course --config foldermix.toml --format md --out course-refresh-context.md --report course-refresh-report.json
```

Duplicate cleanup:

```bash
foldermix pack ./corpus --dedupe-content --report dedupe-report.json --out deduped-context.md
```

## Features

- **Multiple output formats**: Markdown, XML, JSONL
- **Smart filtering**: gitignore support, extension filters, glob patterns
- **Sensitive file protection**: automatically skips `.env`, keys, certificates
- **Optional converters**: PDF (pypdf), OCR-enhanced PDF fallback (rapidocr + pypdfium2), Office docs (python-docx, openpyxl, python-pptx for `.pptx`/`.ppsx`), markitdown
- **Core text-like formats**: plain text, markup, config/data files, and WebVTT (`.vtt`) via the built-in text converter
- **Notebook support**: built-in `.ipynb` conversion, with `--ipynb-include-outputs` to include or omit cell outputs
- **Spreadsheet noise reduction**: XLSX fallback skips low-signal `Copy of ...` tabs by default
- **Optional duplicate suppression**: skip later files whose content exactly matches an earlier included file
- **Redaction**: Email and phone number redaction via `--redact`
- **Line-level filtering**: Remove noisy lines via `--drop-line-containing`
- **Minimum line length**: Drop short lines via `--min-line-length`
- **SHA-256 checksums** per file
- **Parallel processing** with configurable workers
- **Table of contents** in Markdown output

## Config Precedence

`foldermix` resolves effective options in this deterministic order:

1. Built-in defaults
2. `foldermix.toml` values (`--config` or discovered file)
3. Explicit CLI flags

For diagnostics, `pack`, `list`, and `stats` can print the merged result (including source per key) and exit:

```bash
foldermix pack . --print-effective-config
foldermix list . --print-effective-config
foldermix stats . --print-effective-config
```

Config section guidance:

- `[pack]` is the source of truth for file-selection behavior used by `pack`, `list`, and `skiplist`
- `[stats]` remains separate for stats-specific defaults

## Starter Config Profiles

Use `foldermix init` to generate a commented starter `foldermix.toml` for common local workflows:

```bash
foldermix init --profile legal
foldermix init --profile research --out ./configs/foldermix.toml
foldermix init --profile support --force
foldermix init --profile course-refresh --out ./foldermix.toml --force
```

Available profiles:

- `legal` - privacy-first defaults with full redaction and OCR enabled.
- `research` - broad document coverage, including PowerPoint decks and slideshows, with OCR and email-only redaction.
- `support` - ticket/runbook focused filters with full redaction defaults.
- `engineering-docs` - technical docs profile with frontmatter stripping and no redaction.
- `course-refresh` - teaching-material bundle profile that excludes grades, rosters, responses, feedback, and other student/admin paths by default.

## Command Reference

```
foldermix pack [OPTIONS] [PATH]

Options:
  --config PATH                 Path to foldermix TOML config file
  -o, --out PATH                Output file path
  -f, --format TEXT             Output format: md, xml, jsonl [default: md]
  --include-ext TEXT            Comma-separated extensions to include
  --exclude-ext TEXT            Comma-separated extensions to exclude
  --exclude-dirs TEXT           Comma-separated directory names to exclude
  --exclude-glob TEXT           Glob patterns to exclude
  --include-glob TEXT           Glob patterns to include
  --max-bytes INTEGER           Max bytes per file [default: 10000000]
  --max-total-bytes INTEGER     Max total bytes
  --max-files INTEGER           Max number of files
  --hidden                      Include hidden files
  --follow-symlinks             Follow symbolic links
  --respect-gitignore / --no-respect-gitignore  [default: respect]
  --workers INTEGER             Number of worker threads [default: 4]
  --progress                    Show progress bar (requires tqdm)
  --dry-run                     List files without packing
  --report PATH                 Write JSON report to path
  --continue-on-error           Skip files that fail to convert
  --on-oversize TEXT            skip or truncate [default: skip]
  --redact TEXT                 none, emails, phones, all [default: none]
  --drop-line-containing TEXT   Drop lines containing any provided literal substring (repeatable / CSV-compatible)
  --min-line-length INTEGER     Drop lines shorter than this character length [default: 0]
  --strip-frontmatter           Strip YAML frontmatter from files
  --include-sha256 / --no-include-sha256  [default: include]
  --include-toc / --no-include-toc        [default: include]
  --include-skipped-files / --no-include-skipped-files  Include a separate skipped-files section in Markdown output [default: disabled]
  --dedupe-content / --no-dedupe-content  Skip later files whose content exactly matches an earlier included file [default: disabled]
  --pdf-ocr / --no-pdf-ocr                Enable OCR fallback for textless PDF pages [default: disabled]
  --pdf-ocr-strict / --no-pdf-ocr-strict  Fail when OCR is needed but unavailable/empty [default: disabled]
  --image-ocr / --no-image-ocr            Enable OCR for included PNG/JPEG files [default: disabled]
  --image-ocr-strict / --no-image-ocr-strict  Fail when image OCR is needed but unavailable/empty [default: disabled]
  --fail-on-policy-violation / --no-fail-on-policy-violation  Fail command when policy findings meet threshold [default: disabled]
  --policy-fail-level TEXT     Minimum severity for policy-failure threshold: low, medium, high, critical [default: low]
  --policy-dry-run / --no-policy-dry-run  Evaluate policy outcomes without writing packed output [default: disabled]
  --policy-output TEXT          Policy dry-run output format: text, json [default: text]
  --stdin                        Read explicit file paths from standard input instead of recursive scanning
  --null                         Parse stdin as NUL-delimited paths (for find -print0); requires --stdin
  --print-effective-config       Print merged effective config with value sources and exit
```

Additional commands:

```text
foldermix list [OPTIONS] [PATH]
  --config PATH
  --include-ext TEXT
  --exclude-ext TEXT
  --exclude-dirs TEXT
  --exclude-glob TEXT
  --include-glob TEXT
  --max-bytes INTEGER
  --hidden
  --follow-symlinks
  --respect-gitignore / --no-respect-gitignore
  --on-oversize TEXT
  --stdin
  --null
  --print-effective-config

foldermix skiplist [OPTIONS] [PATH]
  --config PATH
  --include-ext TEXT
  --exclude-ext TEXT
  --exclude-dirs TEXT
  --exclude-glob TEXT
  --include-glob TEXT
  --max-bytes INTEGER
  --hidden
  --follow-symlinks
  --respect-gitignore / --no-respect-gitignore
  --on-oversize TEXT
  --conversion-check / --scan-only
  --stdin
  --null
  --print-effective-config

foldermix preview [OPTIONS] [PATH] [FILES]...
  --config PATH
  --format TEXT
  --include-ext TEXT
  --exclude-ext TEXT
  --hidden
  --respect-gitignore / --no-respect-gitignore
  --max-bytes INTEGER
  --on-oversize TEXT
  --continue-on-error
  --redact TEXT
  --drop-line-containing TEXT
  --min-line-length INTEGER
  --strip-frontmatter
  --include-sha256 / --no-include-sha256
  --include-toc / --no-include-toc
  --include-skipped-files / --no-include-skipped-files
  --pdf-ocr / --no-pdf-ocr
  --pdf-ocr-strict / --no-pdf-ocr-strict
  --image-ocr / --no-image-ocr
  --image-ocr-strict / --no-image-ocr-strict
  --stdin
  --null
  --print-effective-config

foldermix stats [OPTIONS] [PATH]
  --config PATH
  --include-ext TEXT
  --hidden
  --stdin
  --null
  --print-effective-config

foldermix init --profile <legal|research|support|engineering-docs|course-refresh> [--out PATH] [--force]

foldermix version
```

`--include-skipped-files` is an opt-in Markdown-only output feature. When enabled, foldermix keeps the regular Table of Contents limited to included files and adds a separate `Skipped Files` section near the top of the rendered Markdown output.

## Report Schema

`--report` writes a versioned schema with machine-actionable reason codes and policy findings while preserving existing human-readable fields.

- Current schema: `schema_version = 5`
- Compatibility policy:
  - Existing keys are preserved (`included_count`, `skipped_count`, `total_bytes`, `included_files`, `skipped_files`).
  - New top-level fields are additive (`schema_version`, `reason_code_counts`, `warning_code_counts`, `redaction_summary`, `policy_findings`, `policy_finding_counts`).
  - New per-entry fields are additive (`reason_code`, `message`, `outcome_codes`, `warning_codes`, `outcomes`, `redaction`).

Example `report.json` shape:

```json
{
  "schema_version": 5,
  "included_count": 2,
  "skipped_count": 1,
  "total_bytes": 1234,
  "included_files": [
    {
      "path": "big.txt",
      "size": 900,
      "ext": ".txt",
      "outcome_codes": ["OUTCOME_TRUNCATED", "OUTCOME_REDACTED"],
      "warning_codes": [],
      "outcomes": [
        {"code": "OUTCOME_TRUNCATED", "message": "File content was truncated to satisfy --max-bytes."},
        {"code": "OUTCOME_REDACTED", "message": "Content was redacted using mode 'emails'."}
      ],
      "redaction": {
        "mode": "emails",
        "event_count": 2,
        "categories": ["emails"]
      }
    }
  ],
  "skipped_files": [
    {
      "path": "image.png",
      "reason": "excluded_ext",
      "reason_code": "SKIP_EXCLUDED_EXT",
      "message": "Path is excluded by extension filtering."
    }
  ],
  "reason_code_counts": {
    "OUTCOME_REDACTED": 1,
    "OUTCOME_TRUNCATED": 1,
    "SKIP_EXCLUDED_EXT": 1
  },
  "warning_code_counts": {},
  "redaction_summary": {
    "mode": "emails",
    "files_with_redactions": 1,
    "event_count": 2,
    "categories": ["emails"]
  },
  "policy_findings": [
    {
      "rule_id": "convert-secret",
      "severity": "high",
      "action": "deny",
      "stage": "convert",
      "path": "notes.txt",
      "reason_code": "POLICY_CONTENT_REGEX_MATCH",
      "message": "Secret marker detected"
    }
  ],
  "policy_finding_counts": {
    "total": 1,
    "by_severity": {"high": 1},
    "by_action": {"deny": 1},
    "by_reason_code": {"POLICY_CONTENT_REGEX_MATCH": 1}
  }
}
```

Canonical reason-code groups:

- Skip reasons: `SKIP_HIDDEN`, `SKIP_EXCLUDED_DIR`, `SKIP_SENSITIVE`, `SKIP_GITIGNORED`, `SKIP_EXCLUDED_GLOB`, `SKIP_EXCLUDED_EXT`, `SKIP_UNREADABLE`, `SKIP_OVERSIZE`, `SKIP_OUTSIDE_ROOT`, `SKIP_MISSING`, `SKIP_NOT_FILE`, `SKIP_UNKNOWN` (fallback when a skip reason cannot be mapped to a specific code)
- Included-file outcomes: `OUTCOME_TRUNCATED`, `OUTCOME_REDACTED`, `OUTCOME_CONVERSION_WARNING`
- Warning taxonomy codes:
  `encoding_fallback`, `converter_unavailable`, `ocr_disabled`, `ocr_dependencies_missing`, `ocr_initialization_failed`, `ocr_failed`, `ocr_no_text`, `unclassified_warning`
- Policy finding reason codes: `POLICY_RULE_MATCH`, `POLICY_SKIP_REASON_MATCH`, `POLICY_CONTENT_REGEX_MATCH`, `POLICY_FILE_SIZE_EXCEEDED`, `POLICY_TOTAL_BYTES_EXCEEDED`, `POLICY_FILE_COUNT_EXCEEDED`

Redaction traceability semantics:

- Per file (`included_files[].redaction`):
  - `mode`: configured redaction mode for the run (`none`, `emails`, `phones`, `all`)
  - `event_count`: number of replacements applied for that file
  - `categories`: redaction categories that matched (`emails`, `phones`)
- Run summary (`redaction_summary`):
  - `mode`: run-level mode (or `mixed` if inconsistent input is provided)
  - `files_with_redactions`: count of files where `event_count > 0`
  - `event_count`: total replacements across all included files
  - `categories`: union of categories matched across included files

## Policy Engine Core

`foldermix` supports rule-based policy evaluation during scan, convert, and pack summary phases.
For an end-to-end compliance workflow (pack selection, enforcement, exit codes, and reason-code reference),
see [docs/compliance-safety.md](docs/compliance-safety.md).

Use `foldermix.toml` (`[pack]`) to define rules:

```toml
[[pack.policy_rules]]
rule_id = "convert-secret"
description = "Detect secret-like markers in converted content"
stage = "convert" # scan | convert | pack | any
severity = "high" # low | medium | high | critical
action = "deny"   # warn | deny
content_regex = "SECRET_[0-9]+"
```

Each rule must include at least one matcher key:
`path_glob`, `ext_in`, `skip_reason_in`, `content_regex`, `max_size_bytes`, `max_total_bytes`, or `max_file_count`.

### Built-in Policy Packs

Use `--policy-pack` to apply a built-in rule bundle:

```bash
foldermix pack . --policy-pack strict-privacy --report report.json
```

Or persist it in `foldermix.toml`:

```toml
[pack]
policy_pack = "strict-privacy" # strict-privacy | legal-hold | customer-support
```

Pack intents and tradeoffs:

- `strict-privacy`:
  prioritize deny-level findings for direct PII/secret markers; higher false-positive tolerance.
- `legal-hold`:
  advisory warnings for legal-retention signals (privileged/destruction markers, hidden-scan coverage).
- `customer-support`:
  advisory findings focused on contact PII and log-like support artifacts.

`policy_pack` rules are combined with explicit `policy_rules` (pack rules first, then custom rules).
Unknown pack names fail with a clear validation error.

### Policy Enforcement Flags (CI/Automation)

Enable deterministic policy-based failure in automation:

```bash
foldermix pack . \
  --policy-pack strict-privacy \
  --fail-on-policy-violation \
  --policy-fail-level high \
  --report report.json
```

Semantics:

- `--fail-on-policy-violation` enables enforcement mode.
- Only policy findings with `action = "deny"` are enforcement-failing.
- `--policy-fail-level` sets the minimum severity for those deny findings (`low`, `medium`, `high`, `critical`).
- Findings are still reported in terminal summary and `--report` output before exiting.
- Enforcement failures exit with code `4`.

### Policy Dry-Run / Explain Mode

Preview policy impact without writing a packed output bundle:

```bash
foldermix pack . \
  --policy-pack strict-privacy \
  --policy-dry-run
```

For machine-readable automation output:

```bash
foldermix pack . \
  --policy-pack strict-privacy \
  --policy-dry-run \
  --policy-output json
```

Semantics:

- `--policy-dry-run` executes scan/convert/pack policy evaluation but skips bundle write.
- Text mode prints a deterministic summary and affected-file list.
- `--policy-output json` emits a deterministic JSON payload to stdout for CI/automation.
- `--policy-output` requires `--policy-dry-run`.
- `--dry-run` and `--policy-dry-run` are mutually exclusive.

## Troubleshooting

- `--null` requires `--stdin`
  - `--null` is only valid when reading explicit paths from standard input.
- `No module named ...` or converter-specific warnings for PDF/Office/OCR
  - Homebrew installs core-only. For optional converter stacks, use one of:
    - `uv tool install "foldermix[all,markitdown]"`
    - `pipx install "foldermix[all,markitdown]"`
    - `pip install "foldermix[pdf]"`, `pip install "foldermix[ocr]"`, `pip install "foldermix[office]"` in a virtualenv
- Need OCR for standalone images
  - image files remain excluded by default; explicitly include them, for example:
    `foldermix pack . --include-ext .png,.jpg,.jpeg --image-ocr`
- Expected files are missing from output
  - run `foldermix list . --config foldermix.toml` and `foldermix skiplist . --config foldermix.toml` to inspect include/skip behavior.
  - check `.gitignore`, hidden-path defaults, extension/glob filters, and sensitive-file protection.
- Need to see exactly which layer set each value
  - use `--print-effective-config` on `pack`, `list`, or `stats`.
- `stdin` path list includes files outside target root
  - these are skipped with structured reason codes (for example `SKIP_OUTSIDE_ROOT`) and included in `--report`.

## Security

See [SECURITY.md](SECURITY.md) for details on sensitive file handling.

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.

Maintainer docs:

- [Maintainer playbook](docs/maintainer-playbook.md) for PR triage, coverage recovery, release, and tap troubleshooting
- [OCR validation set workflow](docs/ocr_validation.md)
- [Homebrew core preparation](docs/homebrew-core.md)

---

## Developer Guide

### Dev Setup

```bash
pip install uv
uv venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
uv pip install -e ".[dev,all]"
pre-commit install
```

### Lint

```bash
ruff check .
ruff format .
pre-commit run --all-files
```

The CI `lint` job runs `ruff check . && ruff format --check .` on every push and pull request.
The repository also ships a `.pre-commit-config.yaml`; after `pre-commit install`, local `git commit` runs Ruff lint/format plus the fast pytest hook before the commit is created.

### Running Tests

```bash
# Fast unit/smoke tests (excludes integration & slow markers; no coverage gate)
pytest -m "not integration and not slow" -o addopts=

# Full suite with branch coverage (gate: ≥ 98%)
pytest --cov=foldermix --cov-branch --cov-fail-under=98 tests/

# Integration/snapshot tests only (no coverage gate)
pytest -m integration -o addopts=

# Performance smoke test (opt-in via env var)
FOLDERMIX_RUN_PERF_SMOKE=1 pytest tests/test_perf_smoke.py -q -o addopts=

# Mutation testing (install extra first)
pip install -e ".[dev,mutation,all]"
python -m mutmut run
python -m mutmut results
```

### Test Suite Overview

| File | Marker | What it covers |
|------|--------|----------------|
| `test_cli.py` | — | CLI argument validation, config construction, `pack`/`list`/`skiplist`/`preview`/`stats`/`version` commands |
| `test_cli_entrypoint.py` | — | CLI entry-point smoke (`foldermix --help`) |
| `test_converters.py` | — | Converter registry: PDF, Office, markitdown, plain-text |
| `test_converters_fallback.py` | — | Fallback behaviour when optional extras are absent |
| `test_packer.py` | — | Core `packer.pack()` logic, error handling, oversize policy |
| `test_packer_edges.py` | — | Edge cases: symlinks, hidden files, max-file limits, report output |
| `test_scanner.py` | — | File scanner: gitignore, extension filters, glob patterns |
| `test_scanner_edge.py` | — | Scanner edge cases: circular symlinks, deeply nested dirs |
| `test_scanner_properties.py` | — | Hypothesis-based property tests for the scanner |
| `test_snapshot_guard.py` | — | Fast guard that snapshot fixtures in `tests/integration/fixtures/expected/` are in sync with the packer |
| `test_utils.py` | — | Utility helpers (redaction, frontmatter stripping, SHA-256) |
| `test_version_module.py` | — | `foldermix.__version__` is set and non-empty |
| `test_writers.py` | — | All three writer classes (Markdown, XML, JSONL) round-trip |
| `test_writers_edge.py` | — | Writer edge cases: empty bundles, special characters, large content |
| `test_render_homebrew_formula.py` | — | Formula renderer helpers |
| `test_perf_smoke.py` | `slow` | Packs 1,500 synthetic files; asserts wall-clock ≤ 25 s and peak RSS ≤ 256 MB |
| `integration/test_pack_outputs.py` | `integration` | Golden-file snapshot tests: Markdown, XML, JSONL output match fixture files |
| `integration/test_pack_outputs_structured.py` | `integration` | Structured assertions on actual pack output (TOC, SHA-256, XML structure) |
| `integration/test_converters_real_files.py` | `integration` | Real-file converter tests (PDF, docx, xlsx, pptx, ppsx) |

Snapshot fixtures live in `tests/integration/fixtures/`:

```
tests/integration/fixtures/
├── simple_project/          # input tree used by snapshot tests
│   ├── alpha.md
│   ├── code.py
│   └── nested/
└── expected/                # golden output files
    ├── simple_project.md
    ├── simple_project.xml
    └── simple_project.jsonl
```

### CI Workflows

| Workflow file | Trigger | Jobs |
|---------------|---------|------|
| `ci.yml` | Every push / PR | `lint` → `smoke` (Python 3.10–3.12 on Ubuntu; Python 3.12 on macOS & Windows) → `minimal-deps` → `package-smoke` → `full` (coverage gate + Codecov) → `publish-pypi` → `update-homebrew-tap` → `release-consumer-smoke-pypi` + `release-consumer-smoke-homebrew` |
| `mutation.yml` | Weekly (Sat 09:00 UTC) + `workflow_dispatch` | `mutmut` on core source modules |
| `perf-smoke.yml` | Weekly (Sun 09:00 UTC) + `workflow_dispatch` | Performance smoke test (1,500 files, ≤ 25 s) |
| `security-audit.yml` | Weekly (Mon 09:00 UTC) + `pyproject.toml` changes + `workflow_dispatch` | `pip-audit` dependency vulnerability scan |

**`ci.yml` job details:**

- **`lint`** – Runs `ruff check` and `ruff format --check`.
- **`smoke`** – Runs unit/smoke tests (excludes `integration` and `slow` markers) across five OS/Python combinations.
- **`minimal-deps`** – Installs only `.[dev]` (no optional extras) and runs the core test files to confirm nothing is accidentally coupled to optional dependencies.
- **`package-smoke`** – Builds a wheel with `python -m build`, installs it in a clean venv, then exercises the CLI with black-box shell assertions.
- **`full`** – Runs the complete pytest suite with `--cov-report=xml` and uploads the coverage report to Codecov. Requires all earlier jobs to pass.
- **`publish-pypi`** – Runs only on pushes to `main`. Detects a version bump in `pyproject.toml` by comparing `HEAD` against `HEAD^`. If a bump is detected, builds and publishes to PyPI via OIDC trusted publishing.
- **`update-homebrew-tap`** – Runs after a successful `publish-pypi`. Calls `scripts/render_homebrew_formula.py` to generate a new Homebrew formula and pushes it to `foldermix/homebrew-foldermix` using the `HOMEBREW_TAP_GITHUB_TOKEN` secret.
- **`release-consumer-smoke-pypi`** – Runs on release publish pushes (`main` + version bump). Installs `foldermix==<released_version>` from PyPI on Linux and runs black-box `version`/`list`/`pack` checks.
- **`release-consumer-smoke-homebrew`** – Runs on release publish pushes after tap update. Installs from `foldermix/foldermix` on macOS and runs black-box `version`/`list`/`pack` checks.
- Both release-consumer jobs upload diagnostic artifacts (`release-consumer-logs`) to simplify install/runtime failure triage.

### Release PR Process

A release is triggered by merging a PR to `main` that bumps the `version` field in `pyproject.toml`. The following checklist describes a complete release PR:

1. **Bump the version** in `pyproject.toml`:
   ```toml
   [project]
   version = "X.Y.Z"
   ```

2. **Update snapshot fixtures** if any packer output has changed:
   - Run the integration tests locally to detect fixture drift:
     ```bash
     pytest -m integration
     ```
   - If `test_pack_outputs.py` or `test_snapshot_guard.py` fail with a diff, copy the fresh output from a passing local run into `tests/integration/fixtures/expected/` and commit the updated fixtures as part of the same PR.

3. **Run the full test suite** locally and confirm all tests pass:
   ```bash
   pytest --cov=foldermix tests/
   ```

4. **Open the PR** targeting `main` and wait for all CI jobs to pass.

5. **Merge to `main`**. The `publish-pypi` job will detect the version bump, build the wheel, and publish to PyPI automatically. The `update-homebrew-tap` job will then update the Homebrew formula, and release-consumer smoke jobs will validate fresh installs from PyPI and, when tap credentials are configured, from Homebrew.

> **Note:** If `HOMEBREW_TAP_GITHUB_TOKEN` is not configured, both tap update and Homebrew release-consumer smoke are skipped. Configure it as a repository secret with write access to `foldermix/homebrew-foldermix` before the first release.

For maintainers preparing a possible `homebrew/core` submission, see [docs/homebrew-core.md](docs/homebrew-core.md).

## License

See [LICENSE](LICENSE).
