Metadata-Version: 2.4
Name: csv-turbo
Version: 1.0.0
Summary: High-performance CSV reader/writer for Python - streaming, typed columns, large files
Project-URL: Homepage, https://sarmkadan.com
Project-URL: Repository, https://github.com/Sarmkadan/csv-turbo
Project-URL: Bug Tracker, https://github.com/Sarmkadan/csv-turbo/issues
Author-email: Vladyslav Zaiets <rutova2@gmail.com>
License: MIT
License-File: LICENSE
Keywords: csv,data,parser,performance,reader,stream,writer
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# csv-turbo

High-performance CSV reader/writer for Python — streaming, typed columns, large files.

[![PyPI version](https://img.shields.io/pypi/v/csv-turbo.svg)](https://pypi.org/project/csv-turbo/)
[![Python](https://img.shields.io/pypi/pyversions/csv-turbo.svg)](https://pypi.org/project/csv-turbo/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

## Features

- **Streaming by default** — iterate rows without loading the entire file into memory
- **Typed columns** — declare int, float, bool, date, datetime schemas and get cast values automatically
- **Schema validation** — catch missing/extra columns and type errors early
- **Dialect detection** — auto-detect delimiter, quoting style, and line endings
- **Pipeline API** — composable lazy transformations: filter, map, select, rename, chunk, sort, unique
- **Column statistics** — profile any CSV with counts, nulls, min/max, mean, std dev, top-N values
- **Built-in dialects** — RFC 4180, TSV, semicolon, pipe, Excel presets
- **Zero dependencies** — pure Python ≥ 3.9, standard library only

## Installation

```bash
pip install csv-turbo
```

## Quick Start

### Reading

```python
from csv_turbo import CsvReader

for row in CsvReader("data.csv"):
    print(row["name"], row["score"])
```

### Writing

```python
from csv_turbo import CsvWriter

rows = [
    {"id": 1, "name": "Alice", "score": 9.5},
    {"id": 2, "name": "Bob",   "score": 8.0},
]

with CsvWriter("output.csv") as w:
    w.writerows(rows)
```

### Typed Schema

```python
from csv_turbo import CsvReader, Schema, ColumnDef, INT, FLOAT, STRING, DATE

schema = Schema([
    ColumnDef("id",    INT,    nullable=False),
    ColumnDef("name",  STRING),
    ColumnDef("score", FLOAT,  nullable=True, default=0.0),
    ColumnDef("dob",   DATE),
])

for row in CsvReader("data.csv", schema=schema):
    # row["id"] is int, row["score"] is float, row["dob"] is datetime.date
    print(row["id"], row["score"])
```

### Streaming Large Files (Chunked)

```python
from csv_turbo import CsvReader

reader = CsvReader("huge_file.csv")
for batch in reader.chunks(500):
    db.bulk_insert(batch)           # list of 500 row dicts
```

### Pipeline API

```python
from csv_turbo import CsvReader
from csv_turbo.streaming import Pipeline

result = (
    Pipeline(CsvReader("sales.csv", schema=schema))
    .filter(lambda r: r["amount"] > 100)
    .map(lambda r: {**r, "vat": round(r["amount"] * 0.2, 2)})
    .select("id", "name", "amount", "vat")
    .sort(key=lambda r: r["amount"], reverse=True)
    .take(50)
    .to_list()
)
```

Write pipeline output directly to CSV:

```python
Pipeline(CsvReader("raw.csv"))     \
    .where(status="active")        \
    .drop("internal_notes")        \
    .write_csv("clean.csv")
```

### Dialect Detection

```python
from csv_turbo.dialect import detect_dialect_from_file

dialect = detect_dialect_from_file("european_export.csv")
print(dialect.delimiter)   # likely ";"

for row in CsvReader("european_export.csv", delimiter=dialect.delimiter):
    ...
```

Use a preset dialect:

```python
from csv_turbo import TSV, CsvReader

for row in CsvReader("data.tsv", delimiter=TSV.delimiter):
    ...
```

### Column Statistics / Profiling

```python
from csv_turbo import CsvReader, profile

rows = CsvReader("data.csv").read_all()
data_profile = profile(rows, top_n=5)
print(data_profile)

# Access per-column stats
stats = data_profile.column("score")
print(stats.mean, stats.std_dev, stats.null_count)
```

### Infer Schema Automatically

```python
from csv_turbo import CsvReader

schema = CsvReader.infer_schema("data.csv", sample_size=500)
print(schema)   # Schema([id:int, name:string, score:float, dob:date])

for row in CsvReader("data.csv", schema=schema):
    print(row)
```

### Read from String (Testing / Inline Data)

```python
from csv_turbo import CsvReader

csv_text = "id,name,score\n1,Alice,9.5\n2,Bob,8.0\n"
for row in CsvReader.from_string(csv_text):
    print(row)
```

### Write to String

```python
from csv_turbo import CsvWriter

csv_str = CsvWriter.to_string([
    {"x": 1, "y": 2},
    {"x": 3, "y": 4},
])
print(csv_str)
```

## API Reference

### `CsvReader`

```python
CsvReader(
    source,            # str | Path | file-like
    *,
    schema=None,       # Schema for typed casting
    delimiter=",",
    quotechar='"',
    encoding="utf-8",
    has_header=True,
    skip_blank_lines=True,
    strip_whitespace=True,
    null_values={""},  # Values treated as None
    row_filter=None,   # Callable[[dict], bool]
    row_transform=None,# Callable[[dict], dict]
    strict=False,      # Raise on extra columns
    max_errors=0,      # Tolerated cast errors
)
```

| Method / Property | Description |
|---|---|
| `__iter__()` | Iterate rows as dicts |
| `chunks(size)` | Iterate in batches |
| `read_all()` | Load all rows into a list |
| `count_rows()` | Count without materialising |
| `headers` | Column names after first read |
| `error_count` | Cast errors encountered |
| `CsvReader.from_string(text)` | Create from a CSV string |
| `CsvReader.infer_schema(path)` | Auto-detect schema |

### `CsvWriter`

```python
CsvWriter(
    destination,           # str | Path | file-like
    *,
    fieldnames=None,       # Column order
    delimiter=",",
    quotechar='"',
    encoding="utf-8",
    write_header=True,
    formatter=None,        # Callable[[Any], str]
    column_formatters={},  # Per-column formatters
    row_transform=None,
    append=False,
    buffer_size=8192,
)
```

| Method | Description |
|---|---|
| `writerow(row)` | Write one dict row |
| `writerows(rows)` | Write an iterable of rows |
| `rows_written` | Count of rows written |
| `CsvWriter.to_string(rows)` | Render to string |

### `Schema` & `ColumnDef`

```python
Schema([
    ColumnDef(name, type, nullable=True, default=None, alias=None),
    ...
])
```

Built-in type singletons: `STRING`, `INT`, `FLOAT`, `BOOL`, `DATE`, `DATETIME`

Parameterised constructors: `IntType(min_value=0)`, `FloatType(precision=2)`,
`DateType(fmt="%d/%m/%Y")`, `StringType(max_length=255)`

### `Pipeline`

| Method | Description |
|---|---|
| `.filter(fn)` | Keep rows where `fn(row)` is True |
| `.map(fn)` | Transform each row |
| `.select(*cols)` | Keep only named columns |
| `.rename(mapping)` | Rename columns |
| `.drop(*cols)` | Remove columns |
| `.add_field(name, fn)` | Compute a new column |
| `.where(**kwargs)` | Equality filter shorthand |
| `.skip(n)` | Skip first n rows |
| `.take(n)` | Keep first n rows |
| `.chunk(size)` | Group into batches |
| `.sort(key)` | Sort (materialises) |
| `.unique(key)` | Deduplicate rows |
| `.peek(fn)` | Side-effect per row |
| `.to_list()` | Materialise to list |
| `.to_dict(key)` | Index by column |
| `.count()` | Count rows |
| `.first()` | First row or None |
| `.aggregate(col, ...)` | Sum/mean/min/max |
| `.write_csv(path)` | Write to file |

### `profile(rows)`

```python
data_profile = profile(rows, top_n=10)
# DataProfile.row_count, .columns, .column(name)
# ColumnStats.count, .null_count, .fill_rate, .unique_count
# ColumnStats.mean, .std_dev, .min_value, .max_value, .median (numeric)
# ColumnStats.min_length, .max_length, .avg_length (string)
# ColumnStats.top_values — list of (value, count)
```

## Error Handling

All exceptions inherit from `CsvTurboError`:

| Exception | When |
|---|---|
| `ParseError` | Malformed CSV structure |
| `TypeCastError` | Value cannot be cast to declared type |
| `SchemaValidationError` | Missing/extra columns |
| `WriteError` | Write failure |
| `ConfigurationError` | Invalid reader/writer options |

```python
from csv_turbo import CsvReader, TypeCastError, SchemaValidationError

try:
    for row in CsvReader("data.csv", schema=schema, max_errors=5):
        process(row)
except TypeCastError as e:
    print(f"Bad value '{e.value}' in column '{e.column}' on line {e.line}")
except SchemaValidationError as e:
    print(f"Missing columns: {e.missing_columns}")
```

## License

MIT © [Vladyslav Zaiets](https://sarmkadan.com)
