Metadata-Version: 2.4
Name: diffino-cli
Version: 0.2.2
Summary: Declarative data diff engine for tables, powered by Polars. Output Excel, HTML, or Typst PDF.
License-Expression: MIT
Project-URL: Repository, https://codeberg.org/songwupei/diffino
Keywords: diff,excel,csv,comparison,polars,openpyxl
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: polars>=0.20.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: typer>=0.9.0
Requires-Dist: duckdb>=0.10.0
Dynamic: license-file

# diffino

Declarative data diff engine for tables, powered by Polars. Compare Excel, CSV, or Parquet files and generate detailed reports with character-level inline diffs.

## Installation

```bash
pip install diffino-cli
```

## Quick Start

1. Prepare a config file:

```yaml
sources:
  left:
    type: excel
    path: data/left.xlsx
  right:
    type: excel
    path: data/right.xlsx

compare:
  - left_sheet: Sheet1
    key_columns:
      - ID
    ignore_columns:
      - Notes

output:
  formats:
    - excel
    - html
```

2. Run diffino:

```bash
diffino run config.yaml
```

3. View the generated reports.

## Features

- **Multi-format sources**: Excel, CSV, Parquet (DuckDB planned)
- **Key-based or fingerprint matching**: Compare by composite keys or full-row hashes
- **Column preprocessing**: Decimal rounding, text normalization, case sensitivity control
- **Character-level inline diff**: Precise word-level change highlighting — red strikethrough for deleted text, green for inserted text
- **Three output formats**:
  - **Excel**: Side-by-side old/new rows, yellow-highlighted changed cells with rich-text inline diffs (`[-deleted-]`, `[+inserted+]`)
  - **HTML**: Self-contained report with `<del>`/`<ins>` tags and JS filtering
  - **Typst**: (stub, full template support planned)
- **Excel styles**:
  - `track`: old data baseline with changes overlaid inline — like Word track changes
  - `final`: new values with yellow highlights on changed cells
  - `side_by_side`: old row (red) + new row (green) for visual comparison
- **Change log sheet**: Summary of all changes in a dedicated sheet

## CLI Commands

```bash
diffino run config.yaml       # Run comparison
diffino validate config.yaml  # Validate config only
```

## Configuration Reference

See `config.example.yaml` for a complete example. Key options:

| Section | Field | Description |
|---|---|---|
| `sources` | `left` / `right` | File type (`excel`, `csv`, `parquet`) and path |
| `compare[]` | `left_sheet` | Sheet name in left file |
| `compare[]` | `right_sheet` | Sheet name in right file (defaults to `left_sheet`) |
| `compare[]` | `key_columns` | Column names used for row matching |
| `compare[]` | `fingerprint` | Use full-row hash instead of key columns |
| `compare[]` | `ignore_columns` | Columns to exclude from comparison |
| `compare[]` | `column_rules` | Preprocessing rules (`decimal`, `text`) |
| `output` | `formats` | List of `excel`, `html`, `typst` |
| `output.excel` | `style` | `final` or `side_by_side` |

## License

MIT
