Metadata-Version: 2.4
Name: polypolars
Version: 0.1.0
Summary: Generate Polars DataFrames using polyfactory for testing and development
Project-URL: Homepage, https://github.com/eddiethedean/polypolars
Project-URL: Documentation, https://github.com/eddiethedean/polypolars#readme
Project-URL: Repository, https://github.com/eddiethedean/polypolars
Project-URL: Issues, https://github.com/eddiethedean/polypolars/issues
Author-email: Odos Matthews <odosmatthews@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: dataframe,factory,mock-data,polars,polyfactory,testing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.8
Requires-Dist: polars>=0.20.0
Requires-Dist: polyfactory>=2.0.0
Requires-Dist: typing-extensions>=4.0.0
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: mkdocs-material>=9.0.0; extra == 'dev'
Requires-Dist: mkdocs>=1.5.0; extra == 'dev'
Requires-Dist: pydantic>=2.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# Polypolars

[![CI](https://github.com/eddiethedean/polypolars/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/eddiethedean/polypolars/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/polypolars.svg)](https://pypi.org/project/polypolars/)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

**Generate type-safe Polars DataFrames effortlessly using polyfactory**

Inspired by [polyspark](https://github.com/eddiethedean/polyspark), polypolars lets you create realistic test DataFrames from your Python data models—with **automatic schema inference** for Polars.

**Docs:** See the [docs/](https://github.com/eddiethedean/polypolars/tree/main/docs) folder and run `mkdocs serve` for the full API reference and examples.

## Example

```python
from dataclasses import dataclass
from polypolars import polars_factory

@polars_factory
@dataclass
class User:
    id: int
    name: str
    email: str

# Generate 1000 rows instantly:
df = User.build_dataframe(size=1000)
print(df.head())
```

Example output (data varies per run):

```
shape: (5, 3)
┌──────┬──────────────────────┬──────────────────────┐
│ id   ┆ name                 ┆ email                │
│ ---  ┆ ---                  ┆ ---                  │
│ i64  ┆ str                  ┆ str                  │
╞══════╪══════════════════════╪══════════════════════╡
│ 3167 ┆ QmYHeLMDMxWChjihAFxU ┆ vHGMKHjXsMBlxLuhqpUE │
│ 1028 ┆ hvLXPtlqURtwzqeyJruo ┆ ePDAdtelIEiRfEuAgoPz │
│ 9048 ┆ NhnyGGQsTjxPEndxaOCt ┆ znmByWtpwofUGKolkJrs │
│  971 ┆ ZlkxcjcVAZfLUkCwHRFG ┆ PTtzmMHcvLQPcOrAgFpl │
│ 3813 ┆ tIqqrgyYjULzdyRKkMKK ┆ tMAFeQewaQFtRGEvOdqW │
└──────┴──────────────────────┴──────────────────────┘
```

## Contents

- [Why Polypolars?](https://github.com/eddiethedean/polypolars#why-polypolars) · [Installation](https://github.com/eddiethedean/polypolars#installation) · [Quick Start](https://github.com/eddiethedean/polypolars#quick-start) · [Schema inference](https://github.com/eddiethedean/polypolars#schema-inference) · [Type mapping](https://github.com/eddiethedean/polypolars#type-mapping) · [CLI](https://github.com/eddiethedean/polypolars#cli) · [I/O and testing](https://github.com/eddiethedean/polypolars#io-and-testing)

## Why Polypolars?

- **Factory pattern**: Leverage [polyfactory](https://github.com/litestar-org/polyfactory) for data generation
- **Type-safe schema**: Python types become Polars dtypes automatically
- **Nullable handling**: `Optional[T]` and defaults are reflected in the schema
- **Complex types**: Nested structs, lists, and dicts (as list-of-structs)
- **Multiple models**: Dataclasses, Pydantic, and TypedDict

## Installation

```bash
pip install polypolars
```

For development:

```bash
pip install "polypolars[dev]"
```

## Quick Start

### Decorator (recommended)

```python
from dataclasses import dataclass
from typing import Optional
from polypolars import polars_factory

@polars_factory
@dataclass
class Product:
    product_id: int
    name: str
    price: float
    description: Optional[str] = None
    in_stock: bool = True

# Build Polars DataFrame
df = Product.build_dataframe(size=100)
print(df.head())

# Or get dicts
dicts = Product.build_dicts(size=50)
```

Example output (first 5 rows; data varies per run):

```
shape: (5, 5)
┌────────────┬──────────────────────┬──────────────┬──────────────────────┬──────────┐
│ product_id ┆ name                 ┆ price        ┆ description          ┆ in_stock │
│ ---        ┆ ---                  ┆ ---          ┆ ---                  ┆ ---      │
│ i64        ┆ str                  ┆ f64          ┆ str                  ┆ bool     │
╞════════════╪══════════════════════╪══════════════╪══════════════════════╪══════════╡
│ 5582       ┆ hKJsoOOXlwgLIiiWOCJP ┆ 2.2760e8     ┆ rTUACBLlGBlHXIjzVvPt ┆ false    │
│ 7099       ┆ ZgUiDVJirxAYRrWIPnpS ┆ 274887.17671 ┆ bHGMXNFRLSDifpywMZrY ┆ true     │
│ 5372       ┆ MTtVHJkqneaCkoyZNgio ┆ 1.5195e7     ┆ HsAmRwgaphvQxOCJwjSr ┆ false    │
│ 8650       ┆ fTBYFPiWMFCKauieEXlu ┆ -7.8765e8    ┆ UAnyfVhTUmvcjtzbCufq ┆ true     │
│ 1023       ┆ MCtTOwvJTjfbpPELcFKm ┆ -97.933431   ┆ PMEHaEOGaoJiDaomXdVX ┆ false    │
└────────────┴──────────────────────┴──────────────┴──────────────────────┴──────────┘
```

### Classic factory class

```python
from polypolars import PolarsFactory

class ProductFactory(PolarsFactory[Product]):
    __model__ = Product

df = ProductFactory.build_dataframe(size=100)
```

### Convenience function

```python
from polypolars import build_polars_dataframe

df = build_polars_dataframe(Product, size=100)
```

## Schema inference

Schema is inferred from your type hints, so all-null columns still get the correct type:

```python
@polars_factory
@dataclass
class User:
    id: int
    email: Optional[str]  # nullable string in Polars

df = User.build_dataframe(size=100)  # schema: id Int64, email String
```

## From dicts

```python
dicts = Product.build_dicts(size=1000)
# Convert to DataFrame when needed:
df = Product.create_dataframe_from_dicts(dicts)
```

## Pydantic

```python
from pydantic import BaseModel, Field
from polypolars import polars_factory

@polars_factory
class User(BaseModel):
    id: int = Field(gt=0)
    username: str = Field(min_length=3, max_length=20)
    email: str
    is_active: bool = True

df = User.build_dataframe(size=500)
```

## Type mapping

| Python       | Polars     |
|-------------|------------|
| `str`       | `String`   |
| `int`       | `Int64`    |
| `float`     | `Float64`  |
| `bool`      | `Boolean`  |
| `datetime`  | `Datetime` |
| `date`      | `Date`     |
| `List[T]`   | `List(T)`  |
| `Dict[K,V]` | `List(Struct(key, value))` |
| `Optional[T]` | `T` (nullable) |
| `Tuple[T, ...]` | `List(T)` |
| `Tuple[T, T, ...]` (fixed) | `Array(T, n)` |
| Dataclass / Pydantic | `Struct(...)` |

Use `schema_overrides` (e.g. `{"col": pl.Categorical}`) to override inferred types.

## LazyFrame and chunked building

```python
# LazyFrame
lf = Product.build_lazy_dataframe(size=10_000)

# Chunked building for very large size (lower memory)
df = Product.build_dataframe(size=1_000_000, chunk_size=10_000)
```

## CLI

```bash
# Export schema
polypolars schema export myapp.models:User --output schema.txt

# Validate a file against a model
polypolars schema validate myapp.models:User data.parquet

# Generate sample data
polypolars generate myapp.models:User --size 1000 --output users.parquet --format parquet
```

## I/O and testing

```python
from polypolars import (
    save_as_parquet,
    load_parquet,
    load_and_validate,
    infer_schema,
    assert_dataframe_equal,
    assert_schema_equal,
)

df = User.build_dataframe(size=1000)
save_as_parquet(df, "users.parquet")

# Load and validate
schema = infer_schema(User)
df2 = load_and_validate("users.parquet", expected_schema=schema)

assert_dataframe_equal(df, df2, check_order=False)
```

## License

MIT

## Related

- [polyspark](https://github.com/eddiethedean/polyspark) – inspiration for this library
- [polyfactory](https://github.com/litestar-org/polyfactory) – factory library for mock data
- [Polars](https://www.pola.rs/) – fast DataFrame library
