Metadata-Version: 2.4
Name: rust-data-processing
Version: 0.2.0
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
License-File: LICENSE-APACHE
License-File: LICENSE-MIT
Summary: Python bindings for rust-data-processing: schema-first CSV/JSON/Parquet/Excel ingestion into an in-memory DataSet.
Author: rust-data-processing contributors
License: MIT OR Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://vihangdesai2018-png.github.io/rust-data-processing/python/examples.html
Project-URL: Repository, https://github.com/vihangdesai2018-png/rust-data-processing

# rust-data-processing

![Phase 1 scope: sources → rust-data-processing → Python / optional AI & ML surfaces](https://raw.githubusercontent.com/vihangdesai2018-png/rust-data-processing/main/docs/images/phase-1-scope-overview.png)

Python bindings for the **[rust-data-processing](https://docs.rs/rust-data-processing)** crate: schema-first ingestion from CSV, JSON, Parquet, and Excel into an in-memory **`DataSet`**, with profiling, validation, Polars-backed pipelines, and SQL.

*Infographic: Phase 1 — single-node, library-first flow (ingest → `DataSet`, pipelines, SQL, profile, validate, outliers, transforms, parallel execution, PyO3 bindings, optional chatbot / notebook story).*

This page is the **PyPI** project description (Python-only). Clone the [repository](https://github.com/vihangdesai2018-png/rust-data-processing) for developer setup, Rust sources, and the full monorepo README.

## Install

```bash
pip install rust-data-processing
```

Requires **Python 3.10+**.

## Quick start

```python
import rust_data_processing as rdp

schema = [
    {"name": "id", "data_type": "int64"},
    {"name": "name", "data_type": "utf8"},
]
ds = rdp.ingest_from_path("path/to/data.csv", schema, {"format": "csv"})
print("rows", ds.row_count())

report = rdp.profile_dataset(ds, {"head_rows": 50, "quantiles": [0.5]})
print("profile rows sampled", report["row_count"])

validation = rdp.validate_dataset(
    ds,
    {"checks": [{"kind": "not_null", "column": "id", "severity": "error"}]},
)
print("checks", validation["summary"]["total_checks"])
```

## Phase 2 (export, privacy, JSONL, median, Delta handoff)

Copy-paste snippets: **[Phase 2 Python examples (Markdown in repo)](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/docs/python/PHASE2_EXAMPLES.md)**. These APIs are also summarized in **[API.md](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/python-wrapper/API.md)** (section **Export, privacy summaries, truncation (Phase 2)**).

## Documentation

| | Link |
| --- | --- |
| **Python examples (HTML, pdoc)** | [GitHub Pages — examples](https://vihangdesai2018-png.github.io/rust-data-processing/python/examples.html) |
| **Python API (HTML, pdoc)** | [GitHub Pages — Python](https://vihangdesai2018-png.github.io/rust-data-processing/python/) |
| **Python API (markdown)** | [API.md in the repository](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/python-wrapper/API.md) |
| **Combined site (landing + Rust rustdoc)** | [GitHub Pages — home](https://vihangdesai2018-png.github.io/rust-data-processing/) |
| **Rust crate API** | [docs.rs/rust-data-processing](https://docs.rs/rust-data-processing) |
| **Repository** | [github.com/vihangdesai2018-png/rust-data-processing](https://github.com/vihangdesai2018-png/rust-data-processing) |

## License

MIT OR Apache-2.0 - see [LICENSE-MIT](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/LICENSE-MIT) and [LICENSE-APACHE](https://github.com/vihangdesai2018-png/rust-data-processing/blob/main/LICENSE-APACHE) in the repository.

