Metadata-Version: 2.4
Name: dpone
Version: 0.2.6
Summary: Declarative ETL framework for YAML-driven data pipelines
Keywords: etl,data-platform,data-engineering,yaml,dag,bigquery,postgres,mssql,sql-server,clickhouse,kafka
Author: PaulKov
License-Expression: Apache-2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Distributed Computing
Classifier: Typing :: Typed
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: jinja2>=3.1.0
Requires-Dist: sqlglot==27.13.2
Requires-Dist: pendulum==3.1.0
Requires-Dist: prettytable==3.17.0
Requires-Dist: pytz>=2023.3
Requires-Dist: idna>=3.15
Requires-Dist: requests>=2.33.0
Requires-Dist: urllib3>=2.7.0
Requires-Dist: ijson>=3.3.0
Requires-Dist: tomli>=2.0,<3 ; python_full_version < '3.11'
Requires-Dist: clickhouse-connect==0.6.22 ; extra == 'clickhouse'
Requires-Dist: clickhouse-driver==0.2.9 ; extra == 'clickhouse'
Requires-Dist: clickhouse-cityhash==1.0.2.5 ; extra == 'clickhouse'
Requires-Dist: psycopg[binary]==3.2.9 ; extra == 'full'
Requires-Dist: clickhouse-connect==0.6.22 ; extra == 'full'
Requires-Dist: clickhouse-driver==0.2.9 ; extra == 'full'
Requires-Dist: clickhouse-cityhash==1.0.2.5 ; extra == 'full'
Requires-Dist: pyodbc>=5.2,<6 ; extra == 'full'
Requires-Dist: google-cloud-bigquery==3.38.0 ; extra == 'full'
Requires-Dist: google-cloud-storage==3.7.0 ; extra == 'full'
Requires-Dist: protobuf>=6.33.5 ; extra == 'full'
Requires-Dist: pyasn1>=0.6.3 ; extra == 'full'
Requires-Dist: pandas==2.1.4 ; extra == 'full'
Requires-Dist: numpy==1.26.4 ; extra == 'full'
Requires-Dist: vault-kv-client>=0.1.0,<0.2.0 ; extra == 'full'
Requires-Dist: google-ads>=28.0.0,<29.0.0 ; extra == 'full'
Requires-Dist: confluent-kafka>=2.14,<3 ; extra == 'full'
Requires-Dist: fastavro>=1.11,<2 ; extra == 'full'
Requires-Dist: jsonschema>=4.23,<5 ; extra == 'full'
Requires-Dist: httpx>=0.28,<1 ; extra == 'full'
Requires-Dist: authlib>=1.3,<2 ; extra == 'full'
Requires-Dist: cachetools>=5,<7 ; extra == 'full'
Requires-Dist: orjson>=3.10,<4 ; extra == 'full'
Requires-Dist: google-cloud-bigquery==3.38.0 ; extra == 'gcp'
Requires-Dist: google-cloud-storage==3.7.0 ; extra == 'gcp'
Requires-Dist: protobuf>=6.33.5 ; extra == 'gcp'
Requires-Dist: pyasn1>=0.6.3 ; extra == 'gcp'
Requires-Dist: google-ads>=28.0.0,<29.0.0 ; extra == 'google-ads'
Requires-Dist: protobuf>=6.33.5 ; extra == 'google-ads'
Requires-Dist: pyasn1>=0.6.3 ; extra == 'google-ads'
Requires-Dist: confluent-kafka>=2.14,<3 ; extra == 'kafka'
Requires-Dist: fastavro>=1.11,<2 ; extra == 'kafka'
Requires-Dist: jsonschema>=4.23,<5 ; extra == 'kafka'
Requires-Dist: protobuf>=6.33.5 ; extra == 'kafka'
Requires-Dist: googleapis-common-protos>=1.70,<2 ; extra == 'kafka'
Requires-Dist: httpx>=0.28,<1 ; extra == 'kafka'
Requires-Dist: authlib>=1.3,<2 ; extra == 'kafka'
Requires-Dist: cachetools>=5,<7 ; extra == 'kafka'
Requires-Dist: orjson>=3.10,<4 ; extra == 'kafka'
Requires-Dist: pyodbc>=5.2,<6 ; extra == 'mssql'
Requires-Dist: pandas==2.1.4 ; extra == 'pandas'
Requires-Dist: numpy==1.26.4 ; extra == 'pandas'
Requires-Dist: psycopg[binary]==3.2.9 ; extra == 'postgres'
Requires-Dist: vault-kv-client>=0.1.0,<0.2.0 ; extra == 'vault'
Maintainer: PaulKov
Requires-Python: >=3.10, <3.13
Project-URL: Homepage, https://github.com/PaulKov/dpone
Project-URL: Repository, https://github.com/PaulKov/dpone
Project-URL: Issues, https://github.com/PaulKov/dpone/issues
Project-URL: Documentation, https://paulkov.github.io/dpone/
Project-URL: Changelog, https://github.com/PaulKov/dpone/blob/master/CHANGELOG.md
Provides-Extra: clickhouse
Provides-Extra: full
Provides-Extra: gcp
Provides-Extra: google-ads
Provides-Extra: kafka
Provides-Extra: mssql
Provides-Extra: pandas
Provides-Extra: postgres
Provides-Extra: vault
Description-Content-Type: text/markdown

# dpone

[![PyPI](https://img.shields.io/badge/pypi-v0.2.6-ea7233.svg)](https://pypi.org/project/dpone/)
[![Python](https://img.shields.io/pypi/pyversions/dpone.svg)](https://pypi.org/project/dpone/)
[![CI](https://github.com/PaulKov/dpone/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/PaulKov/dpone/actions/workflows/ci.yml)
[![Docs](https://github.com/PaulKov/dpone/actions/workflows/pages.yml/badge.svg?branch=master)](https://paulkov.github.io/dpone/)
[![License](https://img.shields.io/pypi/l/dpone.svg)](LICENSE)

`dpone` is a Python ETL framework for declarative, YAML-driven data pipelines. It helps data teams describe sources, sinks, load strategies, dependencies, conventions, and operational checks as reusable configuration instead of one-off scripts.

The public package name, import name, GitHub repository name, and CLI name are all intentionally short: `dpone`.

Repository: https://github.com/PaulKov/dpone

Public install:

```bash
python -m pip install dpone
dpone --help
```

## Documentation map

Start here if you are evaluating or operating dpone:

- [Documentation index](docs/README.md)
- [CLI reference](docs/CLI_REFERENCE.md)
- [Connector overview](docs/CONNECTORS.md)
- [Source -> sink matrix](docs/SOURCE_SINK_MATRIX.md)
- [Manual integration matrix](docs/testing/manual_integration_matrix.md)
- [CI/CD](docs/CI_CD.md)
- [Testing runbooks](docs/testing/index.md)
- [Load strategies](docs/LOAD_STRATEGIES.md)
- [Type mapping matrix](docs/TYPE_MAPPING_MATRIX.md)
- [Schema evolution](docs/SCHEMA_EVOLUTION.md)
- [Production readiness](docs/PRODUCTION_READINESS.md)
- [Architecture](docs/ARCHITECTURE.md)


## What dpone gives you

- YAML manifests for single-process and batch ETL definitions.
- Built-in DAG/dependency inspection for pipeline debugging.
- Runtime abstractions for sources, sinks, connectors, state, reconciliation, and safe SQL logging.
- Optional integrations for PostgreSQL, MSSQL/SQL Server, ClickHouse, BigQuery/GCS, Kafka, pandas, Google Ads, and HashiCorp Vault.
- A CLI designed for self-service validation, rendering, explainability, and documentation checks.
- Compatibility shims for older import paths while the canonical package layout continues to stabilize.

## Installation

Install the core package from PyPI:

```bash
pip install dpone
```

Install common extras for local ETL development:

```bash
pip install "dpone[postgres,mssql,clickhouse,kafka,gcp,pandas,vault]"
```

Install everything currently published by the project:

```bash
pip install "dpone[full]"
```

With `uv`:

```bash
uv add dpone
uv add "dpone[full]"
```

## Optional extras

| Extra | Purpose |
| --- | --- |
| `postgres` | PostgreSQL connectivity via `psycopg` |
| `mssql` | Microsoft SQL Server connectivity via `pyodbc`; production bulk paths use external ODBC Driver 18 and `bcp` |
| `clickhouse` | ClickHouse connectivity |
| `gcp` | Google BigQuery and Google Cloud Storage support |
| `kafka` | Kafka batch source/sink support via `confluent-kafka`, Schema Registry codecs, Avro, JSON Schema, and Protobuf helpers |
| `pandas` | DataFrame-based extract/load helpers |
| `vault` | HashiCorp Vault integration via public `vault-kv-client` |
| `google_ads` | Google Ads API support |
| `full` | All public extras above |

Vault support uses [`vault-kv-client`](https://github.com/PaulKov/vault-kv-client), published on PyPI as `vault-kv-client`. New code should use `vault_kv_client`; the historical `vault_client` import path remains supported by that package as a compatibility layer.

## Quick start

Create a batch manifest, for example `examples/batch/landing_postgres_to_bq.batch.yaml`:

```yaml
# yaml-language-server: $schema=../../src/dpone/schema/etl-batch-manifest.schema.json
kind: dpone.batch.v1
convention: landing_raw_v1
registry: ../registry/sources.yaml

vars:
  src_system: demo_source
  src_database: demo_db
  owner_team: data-platform
  owner_contact: data-platform@example.com
  sla: daily

defaults:
  source:
    type: postgres
    connection_type: vault
    connection_id: postgres-demo
    vault_path: postgres/demo-source
    options:
      batch_size: 100000
      export_format: csv

  sink:
    type: bigquery
    connection_type: vault
    connection_id: bigquery-demo
    vault_path: gcp/demo-project-prod/bq/service-account
    staging:
      schema: stg
    strategy:
      mode: full_refresh
      overwrite_type: exchange

schemas:
  public:
    tables:
      - core_city
```

Validate and render it:

```bash
dpone manifest validate examples/batch/landing_postgres_to_bq.batch.yaml \
  --profile landing_raw_v1 \
  --registry examples/registry/sources.yaml

dpone manifest render examples/batch/landing_postgres_to_bq.batch.yaml \
  --selector public.core_city \
  --registry examples/registry/sources.yaml
```

Inspect pipeline dependencies:

```bash
dpone dag report examples/batch/landing_postgres_to_bq.batch.yaml \
  --base-path . \
  --format json \
  --preset ci \
  --registry examples/registry/sources.yaml
```

## CLI overview

```bash
dpone --help
dpone manifest --help
dpone dag --help
dpone docs --help
```

Common commands:

```bash
dpone manifest list examples/batch/landing_postgres_to_bq.batch.yaml
dpone manifest validate examples/batch/landing_postgres_to_bq.batch.yaml --recursive
dpone manifest render examples/batch/landing_postgres_to_bq.batch.yaml --selector public.core_city
dpone manifest explain examples/batch/landing_postgres_to_bq.batch.yaml --selector public.core_city --why sink.table.schema
dpone dag list-edges examples/batch/landing_postgres_to_bq.batch.yaml --with-groups --with-refs
dpone dag explain-node examples/batch/landing_postgres_to_bq.batch.yaml --task public.core_city
dpone dag report examples/batch/landing_postgres_to_bq.batch.yaml --preset ci --format md
```

## Repository layout

```text
src/dpone/      Python package source code
docs/           User and developer documentation
examples/       Public example manifests and registries
tests/          Unit and integration tests
tools/          Local smoke and release helper scripts
```

Canonical imports live under:

- `dpone.manifest.*`
- `dpone.dag.*`
- `dpone.runtime.*`
- `dpone.contracts.*`
- `dpone.ports.*`
- `dpone.adapters.*`

Legacy paths such as `dpone.core.*`, `dpone.lib.*`, `dpone.source.*`, and `dpone.sink.*` are compatibility shims. Prefer canonical imports for new code.

## Local development

```bash
uv sync --all-extras
uv run ruff check .
uv run ruff format --check .
uv run mypy --config-file mypy.ini
uv run pytest -m "not integration_live"
```

Build package artifacts:

```bash
uv build
```

Run the package smoke script from an installed environment:

```bash
python tools/package_smoke.py --project-root . --dpone-cmd dpone
```

## CI and releases

The OSS repository uses GitHub Actions as the primary automation path. See [CI/CD](docs/CI_CD.md) for the workflow map, detailed runbooks, artifacts, and developer guidance.

Key workflows:

- `.github/workflows/ci.yml` runs linting, formatting checks, type checks, tests, coverage, package build, and PostgreSQL XMin integration.
- `.github/workflows/pages.yml` builds and deploys the GitHub Pages documentation site from `master`.
- `.github/workflows/release.yml` builds and publishes tagged releases to PyPI.
- `.github/workflows/integration-matrix.yml` and `.github/workflows/connector-certification.yml` provide manual/scheduled production-confidence gates.

Release tags use the format `vX.Y.Z`, for example:

```bash
git tag -a vX.Y.Z -m "Release vX.Y.Z"
git push origin vX.Y.Z
```

Prefer PyPI Trusted Publishing for releases. Token-based publishing should only be used as a fallback with short-lived, scoped tokens.

## Security

Never commit API tokens, PyPI tokens, GitHub tokens, Vault credentials, service-account JSON, or live vendor credentials. If a secret is ever pasted into an issue, chat, commit, or CI log, revoke it before publishing or pushing public history.

See [Security policy](SECURITY.md) for the vulnerability reporting process.


## License

`dpone` is licensed under the Apache License 2.0. See [LICENSE](LICENSE).
