Metadata-Version: 2.4
Name: dpyr
Version: 0.0.1
Summary: dplyr for Python: tidy piped verbs over polars and duckdb, with real autocompletion. Name reservation — API under active development.
Project-URL: Repository, https://github.com/maximerivest/dataframe
Author-email: Maxime Rivest <mrive052@gmail.com>
License: MIT
Keywords: data-analysis,dataframe,dplyr,duckdb,polars,tidyverse
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# dpyr

**dplyr for Python.** A tidy, pipe-style data manipulation API — fronting
[polars](https://pola.rs) and [duckdb](https://duckdb.org) — with real IDE
autocompletion and dplyr-faithful semantics, verified by differential testing
against dplyr itself.

```python
from dpyr import read_parquet, col, n, desc, starts_with

(
    starwars
    .filter(col.height > 180, col.mass < 100)
    .mutate(bmi = col.mass / (col.height / 100) ** 2)
    .group_by(col.species)
    .summarize(
        n = n(),
        mean_bmi = col.bmi.mean(),
    )
    .arrange(desc(col.mean_bmi))
)
```

## Principles (the elevator pitch)

1. **dplyr's vocabulary, Python's idiom.** The verbs are dplyr's, verbatim
   (`filter`, `mutate`, `select`, `arrange`, `group_by`, `summarize`,
   joins, tidyselect). The pipe is Python's: method chaining.
2. **As lazy as possible internally, as eager as possible observably.**
   Verbs build a plan; schema errors raise immediately on the offending
   line; displaying/exporting auto-collects. Interactive feel, query-engine
   performance.
3. **Autocompletion is a feature, not an accident.** The `col` proxy and
   per-schema stub generation make column names and column-typed methods
   complete in any IDE.
4. **Two backends, one semantics.** polars (in-memory/files) and duckdb
   (SQL pushdown) must agree, bit-for-bit modulo the documented semantics
   spec. Verified continuously.
5. **dplyr is the oracle.** Compatibility is demonstrated, not claimed:
   golden outputs are generated by actual dplyr in CI.

## Project documents

| Doc | What it pins down |
|---|---|
| [docs/DESIGN.md](docs/DESIGN.md) | API design, laziness/materialization model, autocompletion strategy, architecture |
| [docs/SEMANTICS.md](docs/SEMANTICS.md) | The conformance spec: every deliberate decision where R, polars and duckdb disagree |
| [docs/TESTING.md](docs/TESTING.md) | Test strategy: dplyr-as-oracle, backend differential tests, Hypothesis properties |
| [docs/ROADMAP.md](docs/ROADMAP.md) | Epics and stories to MVP, in dependency order |

## Status

Pre-MVP. The plan is in [docs/ROADMAP.md](docs/ROADMAP.md).
