Metadata-Version: 2.3
Name: pydantic-pyspark
Version: 0.1.0
Summary: Pydantic BaseModel extension that emits PySpark schemas
Author: John Ensley
Author-email: John Ensley <johnensley17@gmail.com>
License: MIT
Requires-Dist: pydantic>=2.0
Requires-Dist: pyspark>=3.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# pydantic-pyspark

A tiny package that extends Pydantic v2's `BaseModel` with a `pyspark_schema()`
classmethod. Define your data contract once with Pydantic and get a matching
PySpark `StructType` for free.

## Install

``` bash
pip install pydantic-pyspark
```

## Usage

``` python
from typing import Optional
from pydantic_pyspark import BaseModel

class Address(SparkModel):
    street: str
    zip_code: str

class User(SparkModel):
    id: int
    name: str
    email: Optional[str] = None
    tags: list[str] = []
    address: Address

print(User.pyspark_schema())
# StructType([
#     StructField('id', LongType(), False),
#     StructField('name', StringType(), False),
#     StructField('email', StringType(), True),
#     StructField('tags', ArrayType(StringType(), False), False),
#     StructField('address', StructType([...]), False),
# ])
```

## Type mapping

| Python / Pydantic                 | PySpark                  |
|-----------------------------------|--------------------------|
| `str`, `uuid.UUID`                | `StringType`             |
| `int`                             | `LongType`               |
| `float`                           | `DoubleType`             |
| `bool`                            | `BooleanType`            |
| `bytes`                           | `BinaryType`             |
| `datetime.datetime`               | `TimestampType`          |
| `datetime.date`                   | `DateType`               |
| `datetime.timedelta`              | `DayTimeIntervalType`    |
| `decimal.Decimal`                 | `DecimalType(38, 18)`    |
| `list[T]` / `set[T]`              | `ArrayType(T)`           |
| `dict[K, V]`                      | `MapType(K, V)`          |
| nested `SparkModel` / `BaseModel` | `StructType`             |
| `Optional[T]` / `T \| None`       | `T` with `nullable=True` |

Unions other than `Optional[T]` are not supported. Spark has no sum types.
