Metadata-Version: 2.4
Name: tha-csv-runner
Version: 0.2.0
Summary: Run a function over every row of a CSV — with progress, header validation, and structured per-row errors.
Project-URL: Homepage, https://github.com/tha-guy-nate/tha-csv-runner
Project-URL: Issues, https://github.com/tha-guy-nate/tha-csv-runner/issues
Author: Nate Wright
License: MIT
License-File: LICENSE
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: tqdm>=4.66
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# tha-csv-runner

[![CI](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml)

A small Python library that runs a function against every row of a CSV — with a progress bar, required header validation, and structured error capture per row.

## Install

```bash
pip install tha-csv-runner
```

## Quick start

```python
from tha_csv_runner import ThaCSV

def process(row: dict) -> None:
    """Raise any exception to mark the row as an error. Return value is ignored."""
    if not row["email"].endswith("@example.com"):
        raise ValueError("invalid email domain")

runner = ThaCSV()

runner.read("Step 1 of 1", "data.csv", ["name", "email"], process)
runner.write("Step 1 of 1", "output.csv")
```

## How it works

1. Opens the CSV and validates that all `required_headers` are present — raises immediately if any are missing
2. Iterates every row with a `tqdm` progress bar labelled with `desc`
3. Calls your `processor(row)` function — if it raises, that row is marked as an error and processing continues
4. Appends three columns to every row: `row number`, `row status`, and `message`
   - On success: `row status` and `message` are blank
   - On error: `row status = "error"`, `message = str(exception)`
5. `write()` writes all rows (success and error) to a CSV

## API

### `ThaCSV`

```python
ThaCSV()
```

### `runner.read()`

```python
runner.read(
    "Step 2 of 10",          # progress bar label — pass None to use the filename
    "data.csv",              # path to input CSV
    ["a", "b"],              # columns that must exist — raises ConfigError if missing
    processor=my_func,       # optional: callable(row: dict) -> None
    sample=100,              # optional: process only the first N rows
    enrich=True,             # optional: set False to skip row number/status/message columns
)
```

Reads and processes all rows. Results are stored in `runner.rows` as a list of dicts.

When `enrich=False`, processor exceptions are re-raised instead of captured.

### `runner.write()`

```python
runner.write(
    "Step 10 of 10",                   # progress bar label — pass None to use the output filename
    output_path="output.csv",          # optional — auto-named input_processed_TIMESTAMP.csv if omitted
    sort_by="name",                    # optional — column name, or list of column names
    ascending=True,                    # optional — bool or list of bools matching sort_by
    column_order=["name", "email"],    # optional — listed columns come first, rest follow
    keep=["name", "email"],            # optional — keep only these columns (mutually exclusive with drop)
    drop=["row number"],               # optional — remove these columns (mutually exclusive with keep)
)
```

Returns the `Path` that was written.

## License

MIT
