Metadata-Version: 2.4
Name: ccflow-etl
Version: 0.3.0
Summary: ETL Tools for ccflow
Project-URL: Repository, https://github.com/1kbgz/ccflow-etl
Project-URL: Homepage, https://github.com/1kbgz/ccflow-etl
Author-email: 1kbgz <dev@1kbgz.com>
License: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.10
Requires-Dist: ccflow>=0.8.5
Requires-Dist: hydra-core
Requires-Dist: lerna
Requires-Dist: pyarrow
Requires-Dist: pydantic
Provides-Extra: develop
Requires-Dist: build; extra == 'develop'
Requires-Dist: bump-my-version; extra == 'develop'
Requires-Dist: check-dist; extra == 'develop'
Requires-Dist: codespell; extra == 'develop'
Requires-Dist: hatchling; extra == 'develop'
Requires-Dist: mdformat; extra == 'develop'
Requires-Dist: mdformat-tables>=1; extra == 'develop'
Requires-Dist: pytest; extra == 'develop'
Requires-Dist: pytest-cov; extra == 'develop'
Requires-Dist: ruff; extra == 'develop'
Requires-Dist: twine; extra == 'develop'
Requires-Dist: ty; extra == 'develop'
Requires-Dist: uv; extra == 'develop'
Requires-Dist: wheel; extra == 'develop'
Description-Content-Type: text/markdown

# ccflow-etl

Domain-neutral ETL building blocks for `ccflow` callable models.

[![Build Status](https://github.com/1kbgz/ccflow-etl/actions/workflows/build.yaml/badge.svg?branch=main&event=push)](https://github.com/1kbgz/ccflow-etl/actions/workflows/build.yaml)
[![codecov](https://codecov.io/gh/1kbgz/ccflow-etl/branch/main/graph/badge.svg)](https://codecov.io/gh/1kbgz/ccflow-etl)
[![License](https://img.shields.io/github/license/1kbgz/ccflow-etl)](https://github.com/1kbgz/ccflow-etl)
[![PyPI](https://img.shields.io/pypi/v/ccflow-etl.svg)](https://pypi.python.org/pypi/ccflow-etl)

`ccflow-etl` provides reusable support primitives for ETL-style workflows built as concrete `ccflow` `CallableModel` graphs. It keeps generic execution concerns here and leaves workflow-specific behavior to the package or application that owns the workflow.

## Install

```bash
pip install ccflow-etl
```

Connector-backed cache and artifact stores are provided by connector packages that own their I/O. Generic checkpointing belongs in `ccflow` proper.

| Package         | Type               | Integration                                   |
| --------------- | ------------------ | --------------------------------------------- |
| `ccflow-s3`     | generic, storage   | S3-backed artifact IO and cache               |
| `ccflow-db`     | generic, cache     | database-backed cache store                   |
| `ccflow-email`  | generic, publisher | email publishers for ETL notifications        |
| `ccflow-celery` | generic, evaluator | Celery-based evaluator for ETL task execution |

## Quick Start

`ccflow-etl` installs shared Hydra entry points for running and explaining configured callables:

```bash
cc-etl +context.path=./example-output.json +context.payload.message='hello from ccflow-etl' +context.overwrite=true
cc-etl-explain +context.path=./example-output.json
```

Most projects provide their own config directory and still use the shared entry point:

```bash
cc-etl --config-path ./config --config-name text_stats +context.input_path=./notes.txt +context.output_path=./stats.json
```

## What It Provides

- Shared CLI entry points: `cc-etl` and `cc-etl-explain`.
- Date expansion: `Interval`, `BaseCalendar`, built-in calendars, `BackfillContext`, and `BackfillModel`.
- Generic credential models and a `/credentials` Hydra registry for package extension.
- Generic extract task composition through `/tasks`, `/datasets`, and `/outputs` config selections.
- Handoff metadata: `ETLArtifact` for typed stage artifacts.
- Artifact IO contracts: `ArtifactExistsModel`, `ArtifactWriteModel`, `ArtifactPublishModel`, and `NoOpArtifactStore` for backend-neutral existence checks, writes, publication, and artifact URIs.
- Task and output composition: `ExtractTaskModel`, `LocalFileOutput`, `NoOpArtifactStore`, and `/tasks` / `/outputs` config selections.
- Format-aware writes and cache handoffs: `LocalWriteModel`, `CachePutModel`, `CacheGetModel`, `PayloadCodec`, `LocalCacheStore`, and no-op cache defaults.
- Retry integration: compatibility exports for `ccflow` `RetryPolicy` and `RetryModel`; use `ccflow.evaluators.RetryEvaluator` for runtime evaluator retries.
- Execution policy: `ExecutionPolicy` for shared max-concurrency hints and rate spacing that evaluators and connector models can consume through the `/execution` Hydra group.
- Run reporting: `RunSummary` for structured counts by status and artifact stage.

## Documentation

- [CLI And Config](docs/src/cli.md)
- [Building Pipelines](docs/src/pipelines.md)
- [Backfills And Calendars](docs/src/backfills.md)
- [Handoffs, Formats, And Reliability](docs/src/handoffs.md)
- [API Reference](docs/src/api.md)
- [Development](docs/src/development.md)

## Package Boundaries

`ccflow-etl` owns domain-neutral ETL contracts, generic credential shapes, and helpers. It does not own application workflows, provider clients, connector clients, provider-specific credential semantics, dataset inventories, dataset-specific schemas, run reporting evaluators, checkpointing, or domain-specific rules. Durable store implementations should live in connector packages and integrate through generic cache and artifact IO contracts.
