Metadata-Version: 2.1
Name: pushcart
Version: 1.7.5
Summary: Metadata transformations for Spark
License: Apache-2.0
Author: Victor Blaga
Author-email: victor.blaga@revodata.nl
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: authlib (>=1.3.0,<2.0.0)
Requires-Dist: azure-monitor-opentelemetry (>=1.4.0,<2.0.0)
Requires-Dist: delta-spark (>=3.1.0,<4.0.0)
Requires-Dist: httpx (>=0.27.0,<0.28.0)
Requires-Dist: loguru (>=0.7.2,<0.8.0)
Requires-Dist: mkdocstrings[python] (>=0.25.1,<0.26.0)
Requires-Dist: opentelemetry-sdk (>=1.24.0,<2.0.0)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: validators (>=0.28.1,<0.29.0)
Description-Content-Type: text/markdown

# pushcart

Helps with moving potatoes, bricks and data around.

Pushcart is a metadata-based solution accelerator running on top of Spark. It
also provides a set of ready-made functionalities for data transformations which
might otherwise take a lot of code to put together using only
`pyspark.sql.functions`.

## Who is this for

- Data engineers writing pure Spark code
- Data engineers working with Databricks Delta Live Tables

## How does the metadata look like?

Useful for transforming data from bronze to silver, a metadata specification
looks as such:

column_order|source_column_name|source_column_type|dest_column_name|dest_column_type|transform_function|default_value|validation_rule|validation_action
------------|------------------|------------------|----------------|----------------|------------------|-------------|---------------|-----------------
0|id|string|Id|int|||Id IS NOT NULL|DROP
1|first_name|string||||||
2|surname|string||||||
3|||Name|string|"F.concat_ws(' ', F.col('first_name'), F.col('surname'))",F.lit('John Doe')||
4|dob|string|DateOfBirth|date|"F.to_date(F.col('dob'), 'yyyy-MM-dd')"|||
5|record_ts|string|RecordDateTime|timestamp|"F.to_timestamp(F.col('record_ts'), 'yyyy-MM-dd HH:mm:ss')"||RecordDateTime IS NOT NULL|DROP

