Metadata-Version: 2.4
Name: prefect-datahub
Version: 1.6.0.10rc3
Summary: DataHub Prefect plugin — automatically capture flow lineage and run metadata from Prefect workflows into your DataHub catalog
Home-page: https://datahub.com/
License: Apache-2.0
Project-URL: Documentation, https://docs.datahub.com/
Project-URL: Source, https://github.com/datahub-project/datahub
Project-URL: Changelog, https://github.com/acryldata/datahub/releases
Project-URL: Releases, https://github.com/acryldata/datahub/releases
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX :: Linux
Classifier: Environment :: Console
Classifier: Environment :: MacOS X
Classifier: Topic :: Software Development
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: asyncpg>=0.30.0
Requires-Dist: requests
Requires-Dist: requests_file
Requires-Dist: acryl-datahub[datahub-rest]==1.6.0.10rc3
Requires-Dist: prefect<4.0.0,>=3.0.0
Provides-Extra: dev
Requires-Dist: asyncpg>=0.30.0; extra == "dev"
Requires-Dist: types-pytz; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: deepdiff; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: tox; extra == "dev"
Requires-Dist: requests-mock; extra == "dev"
Requires-Dist: types-python-dateutil; extra == "dev"
Requires-Dist: requests_file; extra == "dev"
Requires-Dist: types-toml; extra == "dev"
Requires-Dist: jsonpickle; extra == "dev"
Requires-Dist: freezegun; extra == "dev"
Requires-Dist: types-six; extra == "dev"
Requires-Dist: prefect<4.0.0,>=3.0.0; extra == "dev"
Requires-Dist: types-click==0.1.12; extra == "dev"
Requires-Dist: pydantic>=2.0.0; extra == "dev"
Requires-Dist: types-freezegun; extra == "dev"
Requires-Dist: pytest-asyncio>=0.16.0; extra == "dev"
Requires-Dist: pytest-cov>=2.8.1; extra == "dev"
Requires-Dist: packaging; extra == "dev"
Requires-Dist: sqlalchemy-stubs; extra == "dev"
Requires-Dist: types-PyYAML; extra == "dev"
Requires-Dist: requests; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Requires-Dist: coverage>=5.1; extra == "dev"
Requires-Dist: acryl-datahub[datahub-rest]==1.6.0.10rc3; extra == "dev"
Requires-Dist: types-dataclasses; extra == "dev"
Requires-Dist: ruff==0.11.7; extra == "dev"
Requires-Dist: types-cachetools; extra == "dev"
Requires-Dist: mypy==1.17.1; extra == "dev"
Requires-Dist: pytest>=6.2.2; extra == "dev"
Requires-Dist: types-tabulate; extra == "dev"
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

<!-- PyPI long description. Keep concise, feature-discovery-first. -->

# DataHub Prefect Plugin

**Automatic lineage and run metadata from Prefect into DataHub** — captures flow structure, task inputs/outputs, and run history with minimal setup.

## What you can do

- **Emit flow and task metadata** to DataHub as pipeline runs
- **Capture dataset lineage** — declare inputs and outputs per task and see them in DataHub
- **Configure via Prefect blocks** — store your DataHub connection settings as a reusable block
- **Works with any DataHub deployment** — self-hosted or DataHub Cloud

## Installation

```bash
pip install prefect-datahub
```

## Quickstart

### 1. Save your DataHub connection as a Prefect block

```python
from prefect_datahub.datahub_emitter import DatahubEmitter

DatahubEmitter(
    datahub_rest_url="http://localhost:8080",
    env="PROD",
).save("my-datahub")
```

### 2. Use it in your flows

```python
from prefect import flow, task
from prefect_datahub.datahub_emitter import DatahubEmitter
from prefect_datahub.entities import Dataset

emitter = DatahubEmitter.load("my-datahub")

@task
def transform(data, emitter):
    emitter.add_task(
        inputs=[Dataset("snowflake", "mydb.schema.source_table")],
        outputs=[Dataset("snowflake", "mydb.schema.output_table")],
    )
    return data

@flow
def my_pipeline():
    data = extract()
    transform(data, emitter)
    emitter.emit_flow()   # required — emits all metadata at the end
```

## Configuration options

| Option              | Default                 | Description                         |
| ------------------- | ----------------------- | ----------------------------------- |
| `datahub_rest_url`  | `http://localhost:8080` | DataHub GMS URL                     |
| `env`               | `PROD`                  | Environment tag for assets          |
| `platform_instance` | `None`                  | Platform instance for assets        |
| `token`             | `None`                  | Auth token (if GMS auth is enabled) |

## Links

- [Full documentation](https://docs.datahub.com/docs/lineage/prefect)
- [Prefect](https://www.prefect.io/)
- [GitHub](https://github.com/datahub-project/datahub)
- [Slack community](https://datahub.com/slack)
