Metadata-Version: 2.4
Name: quicketl
Version: 1.6.0
Summary: QuickETL - Fast & Flexible Python ETL Framework with 20+ backend support via Ibis
Project-URL: Homepage, https://quicketl.com
Project-URL: Documentation, https://quicketl.com
Project-URL: Repository, https://github.com/ameijin/quicketl
Project-URL: Issues, https://github.com/ameijin/quicketl/issues
Author-email: Eiji <eiji@eidosoft.co>
License: MIT
License-File: LICENSE
Keywords: data-engineering,duckdb,elt,etl,ibis,pipeline,polars,quicketl
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: fsspec>=2024.6
Requires-Dist: ibis-framework[duckdb,polars]>=9.0
Requires-Dist: pydantic>=2.10
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: structlog>=24.0
Requires-Dist: typer>=0.12
Provides-Extra: ai
Requires-Dist: nltk>=3.8; extra == 'ai'
Requires-Dist: openai>=1.0; extra == 'ai'
Requires-Dist: pgvector>=0.2; extra == 'ai'
Requires-Dist: pinecone-client>=3.0; extra == 'ai'
Requires-Dist: psycopg2-binary>=2.9; extra == 'ai'
Requires-Dist: qdrant-client>=1.7; extra == 'ai'
Requires-Dist: sentence-transformers>=2.2; extra == 'ai'
Requires-Dist: tiktoken>=0.5; extra == 'ai'
Provides-Extra: all
Requires-Dist: adlfs>=2024.4; extra == 'all'
Requires-Dist: azure-identity>=1.15; extra == 'all'
Requires-Dist: azure-keyvault-secrets>=4.8; extra == 'all'
Requires-Dist: azure-storage-blob>=12.19; extra == 'all'
Requires-Dist: boto3>=1.34; extra == 'all'
Requires-Dist: gcsfs>=2024.6; extra == 'all'
Requires-Dist: google-cloud-storage>=2.14; extra == 'all'
Requires-Dist: ibis-framework[bigquery]>=9.0; extra == 'all'
Requires-Dist: ibis-framework[clickhouse]>=9.0; extra == 'all'
Requires-Dist: ibis-framework[datafusion]>=9.0; extra == 'all'
Requires-Dist: ibis-framework[mysql]>=9.0; extra == 'all'
Requires-Dist: ibis-framework[postgres]>=9.0; extra == 'all'
Requires-Dist: ibis-framework[pyspark]>=9.0; extra == 'all'
Requires-Dist: ibis-framework[snowflake]>=9.0; extra == 'all'
Requires-Dist: ibis-framework[trino]>=9.0; extra == 'all'
Requires-Dist: mkdocs-gen-files>=0.5; extra == 'all'
Requires-Dist: mkdocs-literate-nav>=0.6; extra == 'all'
Requires-Dist: mkdocs-material>=9.5; extra == 'all'
Requires-Dist: mkdocs-section-index>=0.3; extra == 'all'
Requires-Dist: mkdocs>=1.6; extra == 'all'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'all'
Requires-Dist: nltk>=3.8; extra == 'all'
Requires-Dist: openai>=1.0; extra == 'all'
Requires-Dist: openlineage-python>=1.8; extra == 'all'
Requires-Dist: opentelemetry-api>=1.22; extra == 'all'
Requires-Dist: opentelemetry-exporter-otlp>=1.22; extra == 'all'
Requires-Dist: opentelemetry-sdk>=1.22; extra == 'all'
Requires-Dist: pandas>=2.0; extra == 'all'
Requires-Dist: pandera[polars]>=0.20; extra == 'all'
Requires-Dist: pgvector>=0.2; extra == 'all'
Requires-Dist: pinecone-client>=3.0; extra == 'all'
Requires-Dist: psycopg2-binary>=2.9; extra == 'all'
Requires-Dist: pymdown-extensions>=10.0; extra == 'all'
Requires-Dist: qdrant-client>=1.7; extra == 'all'
Requires-Dist: s3fs>=2024.6; extra == 'all'
Requires-Dist: sentence-transformers>=2.2; extra == 'all'
Requires-Dist: setuptools>=60.0; extra == 'all'
Requires-Dist: tiktoken>=0.5; extra == 'all'
Provides-Extra: aws
Requires-Dist: boto3>=1.34; extra == 'aws'
Requires-Dist: s3fs>=2024.6; extra == 'aws'
Provides-Extra: azure
Requires-Dist: adlfs>=2024.4; extra == 'azure'
Requires-Dist: azure-storage-blob>=12.19; extra == 'azure'
Provides-Extra: bigquery
Requires-Dist: ibis-framework[bigquery]>=9.0; extra == 'bigquery'
Provides-Extra: chunking
Requires-Dist: nltk>=3.8; extra == 'chunking'
Requires-Dist: tiktoken>=0.5; extra == 'chunking'
Provides-Extra: clickhouse
Requires-Dist: ibis-framework[clickhouse]>=9.0; extra == 'clickhouse'
Provides-Extra: contracts
Requires-Dist: pandera[polars]>=0.20; extra == 'contracts'
Provides-Extra: datafusion
Requires-Dist: ibis-framework[datafusion]>=9.0; extra == 'datafusion'
Provides-Extra: docs
Requires-Dist: mkdocs-gen-files>=0.5; extra == 'docs'
Requires-Dist: mkdocs-literate-nav>=0.6; extra == 'docs'
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs-section-index>=0.3; extra == 'docs'
Requires-Dist: mkdocs>=1.6; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'docs'
Requires-Dist: pymdown-extensions>=10.0; extra == 'docs'
Provides-Extra: embeddings-huggingface
Requires-Dist: sentence-transformers>=2.2; extra == 'embeddings-huggingface'
Provides-Extra: embeddings-openai
Requires-Dist: openai>=1.0; extra == 'embeddings-openai'
Requires-Dist: tiktoken>=0.5; extra == 'embeddings-openai'
Provides-Extra: enterprise
Requires-Dist: azure-identity>=1.15; extra == 'enterprise'
Requires-Dist: azure-keyvault-secrets>=4.8; extra == 'enterprise'
Requires-Dist: boto3>=1.34; extra == 'enterprise'
Requires-Dist: openlineage-python>=1.8; extra == 'enterprise'
Requires-Dist: opentelemetry-api>=1.22; extra == 'enterprise'
Requires-Dist: opentelemetry-exporter-otlp>=1.22; extra == 'enterprise'
Requires-Dist: opentelemetry-sdk>=1.22; extra == 'enterprise'
Provides-Extra: gcp
Requires-Dist: gcsfs>=2024.6; extra == 'gcp'
Requires-Dist: google-cloud-storage>=2.14; extra == 'gcp'
Provides-Extra: mysql
Requires-Dist: ibis-framework[mysql]>=9.0; extra == 'mysql'
Provides-Extra: openlineage
Requires-Dist: openlineage-python>=1.8; extra == 'openlineage'
Provides-Extra: opentelemetry
Requires-Dist: opentelemetry-api>=1.22; extra == 'opentelemetry'
Requires-Dist: opentelemetry-exporter-otlp>=1.22; extra == 'opentelemetry'
Requires-Dist: opentelemetry-sdk>=1.22; extra == 'opentelemetry'
Provides-Extra: pandas
Requires-Dist: pandas>=2.0; extra == 'pandas'
Provides-Extra: postgres
Requires-Dist: ibis-framework[postgres]>=9.0; extra == 'postgres'
Provides-Extra: quality
Requires-Dist: pandera[polars]>=0.20; extra == 'quality'
Provides-Extra: secrets-aws
Requires-Dist: boto3>=1.34; extra == 'secrets-aws'
Provides-Extra: secrets-azure
Requires-Dist: azure-identity>=1.15; extra == 'secrets-azure'
Requires-Dist: azure-keyvault-secrets>=4.8; extra == 'secrets-azure'
Provides-Extra: snowflake
Requires-Dist: ibis-framework[snowflake]>=9.0; extra == 'snowflake'
Provides-Extra: spark
Requires-Dist: ibis-framework[pyspark]>=9.0; extra == 'spark'
Requires-Dist: setuptools>=60.0; extra == 'spark'
Provides-Extra: trino
Requires-Dist: ibis-framework[trino]>=9.0; extra == 'trino'
Provides-Extra: vector-pgvector
Requires-Dist: pgvector>=0.2; extra == 'vector-pgvector'
Requires-Dist: psycopg2-binary>=2.9; extra == 'vector-pgvector'
Provides-Extra: vector-pinecone
Requires-Dist: pinecone-client>=3.0; extra == 'vector-pinecone'
Provides-Extra: vector-qdrant
Requires-Dist: qdrant-client>=1.7; extra == 'vector-qdrant'
Description-Content-Type: text/markdown

# QuickETL

**Fast & Flexible Python ETL Framework with 20+ backend support via Ibis**

[![PyPI version](https://badge.fury.io/py/quicketl.svg)](https://pypi.org/project/quicketl/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

QuickETL is a configuration-driven ETL framework that provides a simple, unified API for data processing across multiple compute backends including DuckDB, Polars, Spark, and pandas.

**[Documentation](https://quicketl.com)** | **[GitHub](https://github.com/ameijin/quicketl)**

## Features

- **20+ Backends**: DuckDB, Polars, Spark, pandas, Snowflake, BigQuery, PostgreSQL, and more via Ibis
- **Configuration-driven**: Define pipelines in YAML with variable substitution
- **18 Transforms**: filter, aggregate, join, union, derive_column, window, pivot, unpivot, hash_key, coalesce, cast, fill_null, dedup, sort, select, rename, limit, and more
- **6 Quality Checks**: not_null, unique, row_count, accepted_values, expression, and contract (Pandera)
- **Data Contracts**: Schema validation with Pandera, YAML-defined contracts, and a contract registry
- **Multi-Source Pipelines**: Join and union across multiple data sources in a single pipeline
- **Database Sink**: Write to databases with append, truncate, replace, and upsert modes
- **Partitioned Writes**: Write partitioned Parquet/CSV files by one or more columns
- **Workflows**: Multi-stage pipeline orchestration with parallel execution
- **AI/ML Transforms**: Text chunking and embedding generation for RAG pipelines
- **Secrets Management**: Pluggable providers for AWS Secrets Manager, Azure Key Vault, and env vars
- **Telemetry**: OpenTelemetry and OpenLineage integration for observability
- **CLI & Python API**: Use `quicketl run` or the Pipeline builder pattern
- **Cloud Storage**: S3, GCS, Azure via fsspec

## Installation

```bash
pip install quicketl
```

With optional extras:

```bash
# Specific backends
pip install quicketl[polars]
pip install quicketl[spark]

# AI/ML features
pip install quicketl[embeddings-openai]
pip install quicketl[chunking]

# Data contracts
pip install quicketl[contracts]

# All optional dependencies
pip install quicketl[all]
```

See [installation docs](https://quicketl.com/getting-started/installation/) for backend-specific extras.

## Quick Start

```bash
# Create a new project
quicketl init my_project
cd my_project

# Run the sample pipeline
quicketl run pipelines/sample.yml
```

Or use the Python API:

```python
from quicketl import Pipeline

# From YAML configuration
pipeline = Pipeline.from_yaml("pipeline.yml")
result = pipeline.run()

# Or use the builder pattern
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform, AggregateTransform
from quicketl.config.checks import NotNullCheck

pipeline = (
    Pipeline("sales_summary", engine="duckdb")
    .source(FileSource(path="data/sales.parquet"))
    .transform(FilterTransform(predicate="amount > 0"))
    .transform(AggregateTransform(
        group_by=["region"],
        aggs={"total": "sum(amount)", "count": "count(*)"},
    ))
    .check(NotNullCheck(columns=["region"]))
    .sink(FileSink(path="output/summary.parquet"))
)
result = pipeline.run()
print(result.summary())
```

## Example Pipeline

```yaml
name: sales_etl
engine: duckdb

source:
  type: file
  path: data/sales.parquet

transforms:
  - op: filter
    predicate: amount > 0
  - op: derive_column
    name: revenue
    expr: quantity * unit_price
  - op: aggregate
    group_by: [region]
    aggs:
      total: sum(amount)
      order_count: count(*)

checks:
  - type: not_null
    columns: [region, total]
  - type: row_count
    min: 1

sink:
  type: file
  path: output/summary.parquet
```

## Multi-Source Join

```yaml
name: orders_with_customers
engine: duckdb

sources:
  orders:
    type: file
    path: data/orders.parquet
  customers:
    type: file
    path: data/customers.parquet

transforms:
  - op: join
    right: customers
    "on": [customer_id]
    how: left
  - op: select
    columns: [order_id, customer_name, amount]

sink:
  type: file
  path: output/enriched_orders.parquet
```

## Documentation

Full documentation, tutorials, and API reference at **[quicketl.com](https://quicketl.com)**

- [Getting Started](https://quicketl.com/getting-started/)
- [Pipeline Configuration](https://quicketl.com/guides/configuration/)
- [Supported Backends](https://quicketl.com/guides/backends/)
- [CLI Reference](https://quicketl.com/reference/cli/)

## License

MIT License - see [LICENSE](LICENSE) for details.
