Metadata-Version: 2.4
Name: dltaf
Version: 0.2.5
Summary: Manifest-driven data loading framework with canonical SQLDB manifests, Airflow helpers, and pluggable private integrations.
Author: Pavel Kovalev
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/PaulKov/dltaf
Project-URL: Documentation, https://paulkov.github.io/dltaf/
Project-URL: Issues, https://github.com/PaulKov/dltaf/issues
Project-URL: Changelog, https://github.com/PaulKov/dltaf/blob/master/CHANGELOG.md
Keywords: airflow,clickhouse,data-engineering,dlt,etl,plugins,yaml
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Distributed Computing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dlt<2,>=1.18.2
Requires-Dist: PyYAML>=6.0.1
Requires-Dist: pydantic<3,>=2.8
Requires-Dist: prettytable<4,>=3.12
Provides-Extra: airflow
Requires-Dist: apache-airflow<3,>=2.10; extra == "airflow"
Provides-Extra: clickhouse
Requires-Dist: dlt[clickhouse]<2,>=1.18.2; extra == "clickhouse"
Provides-Extra: sqldb
Requires-Dist: dlt[sql_database]<2,>=1.18.2; extra == "sqldb"
Requires-Dist: sqlalchemy>=2.0.25; extra == "sqldb"
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9.9; extra == "postgres"
Provides-Extra: oracle
Requires-Dist: oracledb>=2.0.0; extra == "oracle"
Requires-Dist: sqlalchemy>=2.0.25; extra == "oracle"
Provides-Extra: mongodb
Requires-Dist: pymongo>=4.6.0; extra == "mongodb"
Provides-Extra: vault
Requires-Dist: vault-kv-client>=0.1.0; extra == "vault"
Provides-Extra: runtime
Requires-Dist: vault-kv-client>=0.1.0; extra == "runtime"
Requires-Dist: dlt[clickhouse]<2,>=1.18.2; extra == "runtime"
Provides-Extra: all
Requires-Dist: apache-airflow<3,>=2.10; extra == "all"
Requires-Dist: vault-kv-client>=0.1.0; extra == "all"
Requires-Dist: dlt[clickhouse]<2,>=1.18.2; extra == "all"
Requires-Dist: dlt[sql_database]<2,>=1.18.2; extra == "all"
Requires-Dist: sqlalchemy>=2.0.25; extra == "all"
Requires-Dist: psycopg2-binary>=2.9.9; extra == "all"
Requires-Dist: oracledb>=2.0.0; extra == "all"
Requires-Dist: pymongo>=4.6.0; extra == "all"
Provides-Extra: dev
Requires-Dist: build>=1.2.2; extra == "dev"
Requires-Dist: mkdocs>=1.6.1; extra == "dev"
Requires-Dist: mkdocs-material>=9.6.14; extra == "dev"
Requires-Dist: pytest>=8.3.5; extra == "dev"
Requires-Dist: ruff>=0.11.5; extra == "dev"
Requires-Dist: twine>=6.1.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6.1; extra == "docs"
Requires-Dist: mkdocs-material>=9.6.14; extra == "docs"
Dynamic: license-file

# dltaf

[![CI](https://github.com/PaulKov/dltaf/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/PaulKov/dltaf/actions/workflows/ci.yml)
[![Docs](https://github.com/PaulKov/dltaf/actions/workflows/pages.yml/badge.svg?branch=master)](https://github.com/PaulKov/dltaf/actions/workflows/pages.yml)
[![PyPI](https://img.shields.io/pypi/v/dltaf.svg)](https://pypi.org/project/dltaf/)
[![License](https://img.shields.io/github/license/PaulKov/dltaf.svg)](LICENSE)

`dltaf` is a manifest-driven data loading framework built around three ideas:

- canonical, reviewable YAML manifests
- a stable OSS core for generic sources
- extension registries that let private integrations stay private

The public repository ships a clean stage63-based core with:

- canonical `source.kind: sqldb` for relational ingestion
- built-in `mongodb` support
- compatibility aliases for legacy SQL manifests such as `sql_database`, `oracle_custom_sql`, and `oracle`
- Airflow DAG generation helpers
- manifest linting, doctoring, scaffolding, and lineage tooling
- Vault-backed secrets resolution through [`vault-kv-client`](https://github.com/PaulKov/vault-kv-client)

Private connectors such as internal APIs, Kafka-backed flows, or company-specific uploaders are intentionally not bundled into the OSS package. They should live in your monorepo or private package index and plug into the same runner, hook, and infra-check registries.

## Why dltaf

- `Manifest-first`: pipeline behavior stays diffable and reviewable
- `Canonical SQL model`: one public SQL contract, with legacy aliases supported as migration shims
- `Plugin-first`: private integrations extend the framework without forking it
- `Airflow-friendly`: the same manifest can be linted locally, planned in CI, and executed in DAG wrappers
- `Self-service`: example manifests, template generation, and migration guidance ship with the package

## Installation

Lean core install for linting, planning, docs, template generation, and non-runtime tooling:

```bash
pip install dltaf
```

Common runtime profiles:

```bash
# Generic Airflow bridge + DAG builder helpers
pip install "dltaf[airflow]"

# ClickHouse destination + Vault-backed private plugin flows
pip install "dltaf[runtime]"

# PostgreSQL or other SQLDB catalog ingestion into ClickHouse
pip install "dltaf[clickhouse,sqldb,postgres]"

# Oracle query-driven ingestion into ClickHouse
pip install "dltaf[clickhouse,sqldb,oracle]"

# MongoDB ingestion into ClickHouse
pip install "dltaf[clickhouse,mongodb]"
```

Developer install:

```bash
git clone https://github.com/PaulKov/dltaf.git
cd dltaf
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .[dev]
```

The public package is intentionally split into extras so Airflow `PythonVirtualenvOperator`
tasks and CI smoke jobs do not have to install Oracle, MongoDB, Vault, and every SQL driver
when they only need one runtime slice.

## Quick start

Validate the canonical SQL example:

```bash
dltaf manifest lint --manifest dltaf/examples/manifests/smoke_sqldb_catalog.yaml --allow-filename-mismatch
```

Render a safe execution plan without side effects:

```bash
dltaf manifest run \
  --manifest dltaf/examples/manifests/smoke_sqldb_catalog.yaml \
  --plan
```

Generate a new public-safe template:

```bash
dltaf manifest doctor \
  --template-kind sqldb_query \
  --pipeline-name dlt__oracle__to__clickhouse__raw
```

Generate Airflow DAG wrappers:

```bash
dltaf dags generate --manifests-dir ./manifests --output-dir ./generated_dags
```

Show lineage:

```bash
dltaf lineage show --format mermaid
```

## Canonical built-ins

### `sqldb`

`sqldb` is the canonical relational source kind.

Use `mode: catalog` when you want schema-and-table driven extraction:

- PostgreSQL, MySQL, MSSQL, or other generic SQL databases
- catalog-level table selection
- canonical shape under `source.catalog`

Use `mode: query` when you want explicit Oracle SQL queries:

- one or more named queries
- query files under `dltaf/examples/sql/` or your own repo
- Oracle-specific options under `source.dialect_options`

### `mongodb`

Use `mongodb` when you want one or more collections loaded through the bundled generic runtime:

- explicit collection selection
- optional table nesting control
- manifest-level replace/append behavior through `run.write_disposition`

## Compatibility aliases

`dltaf` still accepts older SQL source kinds as compatibility shims:

- `sql_database` -> canonicalized to `sqldb + dialect=generic + mode=catalog`
- `oracle_custom_sql` -> canonicalized to `sqldb + dialect=oracle + mode=query`
- `oracle` -> canonical alias for Oracle query mode

The public recommendation is still to write new manifests directly in canonical `sqldb` form.

## Private integrations

The OSS core uses three extension registries:

- runner plugins
- hook plugins
- infra-check plugins

You can load private modules either from the environment or directly from a manifest:

```yaml
run:
  runners:
    plugins:
      - internal.dltaf_plugins.customer_export.runner_plugin
  hooks:
    plugins:
      - internal.dltaf_plugins.shared.hooks
  online_checks:
    plugins:
      - internal.dltaf_plugins.customer_export.infra_checks
```

Or through environment variables:

```bash
export DLT_RUNNER_PLUGINS="internal.dltaf_plugins.customer_export.runner_plugin"
export DLT_HOOK_PLUGINS="internal.dltaf_plugins.shared.hooks"
export DLT_INFRA_CHECK_PLUGINS="internal.dltaf_plugins.customer_export.infra_checks"
```

This keeps the manifest contract stable even if the private catalog later moves from a monorepo to a private wheel.

The roadmap for evolving this split between OSS core and private integrations lives in [ROADMAP.md](ROADMAP.md).

## Vault integration

`dltaf` resolves manifest Vault references through `vault-kv-client`.

Supported reference forms:

- `vault://mount/path`
- `mount:path`
- mapping form with `mount_point`, `path`, and optional `kv_version`
- mapping form with `ref` plus explicit `kv_version`

Recommended explicit KV v2 pattern:

```yaml
connections:
  source:
    kind: postgres
    vault:
      ref: ${ENV:POSTGRES__VAULT_REF|company:postgres/example}
      kv_version: "2"
```

That contract is intentionally simple and portable across local runs, CI, and Airflow.

## Shipped examples

Canonical examples live under `dltaf/examples/manifests/`:

- `smoke_sqldb_catalog.yaml`
- `smoke_sqldb_query.yaml`
- `smoke_mongodb.yaml`

Compatibility examples are also shipped for migration and search continuity:

- `smoke_sql_database_catalog.yaml`
- `smoke_oracle_custom_sql.yaml`
- `smoke_mongodb_catalog.yaml`

All examples are sanitized. Replace the sample Vault refs and connection overrides with values from your own environment.

## Documentation

Full docs live on GitHub Pages:

- Docs: https://paulkov.github.io/dltaf/
- Getting started: https://paulkov.github.io/dltaf/getting-started/
- Installation profiles: https://paulkov.github.io/dltaf/installation-profiles/
- Examples: https://paulkov.github.io/dltaf/examples/
- Plugins: https://paulkov.github.io/dltaf/plugins/
- Airflow: https://paulkov.github.io/dltaf/airflow/

## Development

Run the standard checks locally:

```bash
ruff check .
pytest
python -m build
mkdocs build --strict
```

## Roadmap

The near-term focus is:

- keep `sqldb` and `mongodb` boring, explicit, and stable
- improve self-service docs, templates, and examples
- make private registries easy to adopt from a monorepo or a private package index
- preserve compatibility aliases long enough for staged migrations without surprise breakage

## License

Apache-2.0. See [LICENSE](LICENSE).
