Metadata-Version: 2.4
Name: tha-csv-runner
Version: 0.2.5
Summary: Run a function over every row of a CSV — with progress, header validation, and structured per-row errors.
Project-URL: Homepage, https://github.com/tha-guy-nate/tha-csv-runner
Project-URL: Issues, https://github.com/tha-guy-nate/tha-csv-runner/issues
Author: Nate Wright
License: MIT
License-File: LICENSE
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: tqdm>=4.66
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# tha-csv-runner

[![CI](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml)

A small Python library that runs a function against every row of a CSV — with a progress bar, required header validation, and structured error capture per row.

## Install

```bash
pip install tha-csv-runner
```

## Quick start

```python
from tha_csv_runner import ThaCSV

def process(row: dict) -> None:
    """Raise any exception to mark the row as an error. Return value is ignored."""
    if not row["email"].endswith("@example.com"):
        raise ValueError("invalid email domain")

runner = ThaCSV()

rows = runner.read("Step 1 of 2", "data.csv", ["name", "email"], process)
runner.write("Step 2 of 2", "output.csv")
```

## How it works

1. Opens the CSV and validates that all `required_headers` are present — raises immediately if any are missing
2. Iterates every row with a `tqdm` progress bar labelled with `desc`
3. Calls your `validator(row)` function — if it raises, that row is marked as an error and processing continues
4. Appends three columns to every row: `row number`, `row status`, and `message`
   - `row number` starts at 2 (row 1 is the header)
   - On success: `row status` and `message` are blank
   - On error: `row status = "error"`, `message = str(exception)`
5. `write()` writes all rows (success and error) to a CSV

## API

### `ThaCSV`

```python
ThaCSV()
```

### `runner.read()`

```python
runner.read(
    "Step 1 of 2",           # progress bar label — pass None to use the filename
    "data.csv",              # path to input CSV
    ["a", "b"],              # columns that must exist — raises ConfigError if missing
    validator=my_func,       # optional: callable(row: dict) -> None
    enrich=True,             # optional: set False to skip row number/status/message columns
)
```

Reads and processes all rows. Returns the rows as a `list[dict]` (same object as `runner.rows`).

When `enrich=False`, validator exceptions are re-raised instead of captured.

### `runner.write()`

```python
runner.write(
    "Step 2 of 2",                     # progress bar label — pass None for "Writing {stem} CSV"
    output_path="output.csv",          # optional — auto-named input_processed_TIMESTAMP.csv if omitted
    rows=my_rows,                      # optional — use these rows instead of runner.rows
    sort_by="name",                    # optional — column name, or list of column names
    ascending=True,                    # optional — bool or list of bools matching sort_by
    column_order=["name", "email"],    # optional — listed columns come first, rest follow
    keep=["name", "email"],            # optional — keep only these columns (mutually exclusive with drop)
    drop=["row number"],               # optional — remove these columns (mutually exclusive with keep)
    chunk_size=1000,                   # optional — split output into files of this many rows
)
```

Prints `:white_check_mark: Done! CSV was written to: {path}` on completion. Override by setting `runner.status_cb = my_fn`.

Returns the `Path` that was written, or a `list[Path]` when `chunk_size` is set.

#### `chunk_size`

When provided, `write()` splits the output into multiple files named `output_001.csv`, `output_002.csv`, etc. and returns a `list[Path]`.

```python
paths = runner.write("Step 2 of 2", "output.csv", chunk_size=1000)
# ["output_001.csv", "output_002.csv", ...]
```

## License

MIT
