Metadata-Version: 2.4
Name: merger-cli
Version: 2.0.1
Summary: Merger is a tool that scans a directory, filters files using customizable patterns, and merges readable content into a single output file.
Author-email: Diogo Toporcov <diogotoporcov@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/diogotoporcov/merger-cli
Project-URL: Documentation, https://github.com/diogotoporcov/merger-cli
Keywords: merger,file system,concatenation,automation,development
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: chardet>=5.2.0
Requires-Dist: filetype>=1.2.0
Dynamic: license-file

# Merger CLI

[![Python](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/merger-cli.svg?color=orange)](https://pypi.org/project/merger-cli/)

Merger is a command-line utility for developers that scans a directory, filters files using customizable ignore patterns, and merges all readable content into a single **structured JSON output file**. It supports custom file parsers, making it easily extendable for formats such as `.pdf` or any domain-specific format.

---

## Summary

1. [Core Features](#core-features)
2. [Dependencies](#dependencies)
3. [Installation with PyPI](#installation-with-pypi)
4. [Build and Install Locally](#build-and-install-locally)
5. [Usage](#usage)
6. [Custom Parsers](#custom-parsers)
7. [CLI Options](#cli-options)
8. [License](#license)
 
---

## Core Features

* **Recursive merge** of all readable files under a root directory.
* **Glob-based ignore patterns** using `.gitignore`-style syntax.
* **Automatic binary validation and parsing**.
* **Modular parser system** for custom formats.
* **CLI support** for installation, removal, and listing of custom parsers.
* **Structured JSON merged output**, including a file tree.

---

## Dependencies

| Component  | Version | Notes    |
|------------|---------|----------|
| **Python** | ≥ 3.8   | Required |

All dependencies are listed in [`requirements.txt`](requirements.txt).

---

## Installation with PyPI

```bash
pip install merger-cli
```

---

## Build and Install Locally

### 1. Clone the repository

```bash
git clone https://github.com/diogotoporcov/merger-cli.git
cd merger-cli
```

### 2. Create and activate a virtual environment

**Linux / macOS**

```bash
python -m venv .venv
source .venv/bin/activate
```

**Windows (PowerShell)**

```powershell
python -m venv .venv
.venv\Scripts\Activate.ps1
```

### 3. Install dependencies

```bash
pip install -r requirements.txt
```

### 4. Install as CLI tool

```bash
pip install .
```

---

## Usage

### Basic merge

```bash
merger ./src
```

---

### Custom ignore patterns

```bash
merger ./project ./output.json --ignore "*.log" "__pycache__" "*.tmp"
```

---

### Custom ignore file

```bash
merger . ./output.json --merger-ignore "C:\Users\USER\Desktop\merger.ignore"
```

---

### Verbose output

```bash
merger ./src ./merger.json --log-level DEBUG
```

---

## Custom Parsers

Merger uses **parser strategies** to support parsing of non-text file formats.

---

### Parser Abstract Class

All parsers must inherit from `Parser`:

```python
from merger.parsing.parser import Parser
```

Required structure:

* `EXTENSIONS: Set[str]`
* `CHUNK_BYTES_FOR_VALIDATION: Optional[int]`
* `validate(cls, file_chunk_bytes, *, file_path=None, logger=None) -> bool`
* `parse(cls, file_bytes, *, file_path=None, logger=None) -> str`

---

### Installing a Custom Parser

```bash
merger --install-module path/to/parser.py
```

To uninstall a module:

```bash
merger --uninstall-module <module_id>
```

To remove all modules:

```bash
merger --uninstall-module *
```

To list installed modules:

```bash
merger --list-modules
```

---

### Custom Parser Example (PDF)

```python
import logging
from pathlib import Path
from typing import Union, Optional, Any, Set, Type

import fitz

from merger.parsing.parser import Parser


class PdfParser(Parser):
    EXTENSIONS: Set[str] = {".pdf"}
    CHUNK_BYTES_FOR_VALIDATION: Optional[int] = None

    @classmethod
    def validate(
        cls,
        file_chunk_bytes: Union[bytes, bytearray],
        *,
        file_path: Optional[Path] = None,
        logger: Optional[logging.Logger] = None
    ) -> bool:
        """
        Validate that the given file represents a readable PDF document.

        Args:
            file_chunk_bytes: Binary contents of the file being validated, sufficient to perform validation.
            file_path: Path of the file being validated.
            logger: Optional logger instance for logging.

        Returns:
            bool: True if the file is a readable PDF, False otherwise.
        """
        try:
            with fitz.open(file_path) as doc:
                _ = doc[0]
            return True

        except Exception:
            return False

    @classmethod
    def parse(
        cls,
        file_bytes: Union[bytes, bytearray],
        *,
        file_path: Optional[Path] = None,
        logger: Optional[logging.Logger] = None,
    ) -> str:
        """
        Extracts and concatenates text from all pages of a PDF file.

        Args:
            file_bytes: Binary contents of the file being parsed.
            file_path: Path of the file being parsed.
            logger: ptional logger instance for logging.

        Returns:

        """
        texts = []
        with fitz.open(stream=file_bytes) as doc:
            for page in doc:
                text = page.get_text()
                if text:
                    text = text.replace("\n\n", "")
                    texts.append(text)

        full_text = " ".join(texts)
        return full_text


parser_cls: Type[Parser] = PdfParser
```

> The module **must expose a `parser_cls` object** referencing the parser class.

This implementation is available at [`examples/custom_parsers/pdf_parser.py`](examples/custom_parsers/pdf_parser.py).

---

## Output Format

The merged result is a single JSON file containing:

* Directory tree 
* File and directory names
* Relative paths to the root
* Extracted text content

---

## CLI Options

| Option                   | Description                                                |
|--------------------------|------------------------------------------------------------|
| `input_dir`              | Root directory to scan for files                           |
| `output_path`            | Path to save merged JSON output (default: `./merger.json`) |
| `-i, --install-module`   | Install a custom parser module                             |
| `-u, --uninstall-module` | Uninstall a parser module by ID (`*` removes all)          |
| `-l, --list-modules`     | List installed parser modules                              |
| `--ignore`               | Glob-style ignore patterns                                 |
| `-mi, --merger-ignore`   | Ignore file (default: `./merger.ignore`)                   |
| `--version`              | Show installed version                                     |
| `-ll, --log-level`       | Set logging verbosity                                      |

---

## License

This project is licensed under the MIT License — see [LICENSE](LICENSE) for details.
