Metadata-Version: 2.4
Name: rust-data-processing
Version: 0.2.2
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Summary: Python bindings for rust-data-processing: schema-first CSV/JSON/Parquet/Excel ingestion into an in-memory DataSet.
Author: rust-data-processing contributors
License: MIT OR Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://vihangdesai2018-png.github.io/rust-data-processing/python/examples.html
Project-URL: Repository, https://github.com/vihangdesai2018-png/rust-data-processing

# rust-data-processing

![Phase 2 scope: Phase 1 baseline plus export, privacy, Arrow, incremental ETL → Python; JVM planned](https://raw.githubusercontent.com/vihangdesai2018-png/rust-data-processing/main/docs/images/phase-2-scope-overview.png)

Python bindings for the **[rust-data-processing](https://docs.rs/rust-data-processing)** crate: schema-first ingestion from CSV, JSON, Parquet, and Excel into an in-memory **`DataSet`**, with profiling, validation, Polars-backed pipelines, SQL, and **Phase 2** JSONL export, privacy transforms and summaries, median, Arrow interop, and incremental ingest helpers.

*Infographic: Phase 2 — Phase 1 flow plus export, privacy, median, Arrow, incremental ETL; JVM planned Phase 3.*

This page is the **PyPI** project description (Python-only). Clone the [repository](https://github.com/vihangdesai2018-png/rust-data-processing) for developer setup, Rust sources, and the full monorepo README.

## Install

```bash
pip install rust-data-processing
```

Requires **Python 3.10+**.

## Quick start

```python
import rust_data_processing as rdp

schema = [
    {"name": "id", "data_type": "int64"},
    {"name": "name", "data_type": "utf8"},
]
ds = rdp.ingest_from_path("path/to/data.csv", schema, {"format": "csv"})
print("rows", ds.row_count())

report = rdp.profile_dataset(ds, {"head_rows": 50, "quantiles": [0.5]})
print("profile rows sampled", report["row_count"])

validation = rdp.validate_dataset(
    ds,
    {"checks": [{"kind": "not_null", "column": "id", "severity": "error"}]},
)
print("checks", validation["summary"]["total_checks"])
```

## Phase 2 (export, privacy, JSONL, median, Delta handoff)

Copy-paste snippets: **[Phase 2 Python examples (Markdown in repo)](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/docs/python/PHASE2_EXAMPLES.md)**. These APIs are also summarized in **[API.md](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/python-wrapper/API.md)** (section **Export, privacy summaries, truncation (Phase 2)**).

## Documentation

| | Link |
| --- | --- |
| **This package on PyPI** | [pypi.org/project/rust-data-processing](https://pypi.org/project/rust-data-processing/) |
| **Python examples (HTML, pdoc)** | [GitHub Pages — examples](https://vihangdesai2018-png.github.io/rust-data-processing/python/examples.html) |
| **Python API (HTML, pdoc)** | [GitHub Pages — Python](https://vihangdesai2018-png.github.io/rust-data-processing/python/) |
| **Python API (markdown)** | [API.md in the repository](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/python-wrapper/API.md) |
| **Combined site (landing + Rust rustdoc)** | [GitHub Pages — home](https://vihangdesai2018-png.github.io/rust-data-processing/) |
| **Rust crate API** | [docs.rs/rust-data-processing](https://docs.rs/rust-data-processing) |
| **Repository** | [github.com/vihangdesai2018-png/rust-data-processing](https://github.com/vihangdesai2018-png/rust-data-processing) |

## License

MIT OR Apache-2.0 - see [LICENSE-MIT](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/LICENSE-MIT) and [LICENSE-APACHE](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/LICENSE-APACHE) in the repository.

