Metadata-Version: 2.4
Name: airflow-provider-dqlens
Version: 0.1.0
Summary: Apache Airflow provider for DQLens data quality checks
Author-email: Vahid <vahid@dqlens.dev>
License: MIT
Project-URL: Homepage, https://github.com/vahid110/dqlens
Project-URL: Repository, https://github.com/vahid110/dqlens
Keywords: airflow,data-quality,dqlens,testing,profiling
Classifier: Development Status :: 3 - Alpha
Classifier: Framework :: Apache Airflow
Classifier: Framework :: Apache Airflow :: Provider
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Database
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: apache-airflow>=2.0
Requires-Dist: dqlens>=0.4.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# airflow-provider-dqlens

[![PyPI](https://img.shields.io/pypi/v/airflow-provider-dqlens)](https://pypi.org/project/airflow-provider-dqlens/)

Apache Airflow provider for DQLens. Add auto-generated data quality checks to your DAGs with one task.

## Install

```bash
pip install airflow-provider-dqlens
```

## Usage

```python
from dqlens_airflow import DQLensOperator

quality_check = DQLensOperator(
    task_id="dqlens_quality_check",
    conn_id="my_postgres",
    schema="public",
    focus="high",
)

load_data >> quality_check >> downstream_tasks
```

If DQLens finds problems, the task fails and downstream tasks don't run.

## Parameters

| Parameter | Required | Default | Description |
|---|---|---|---|
| `conn_id` | Yes | | Airflow connection ID |
| `schema` | No | `public` | Schema to profile |
| `focus` | No | `all` | Severity filter: `high`, `medium`, `all` |
| `quick` | No | `False` | Sampled profiling (faster) |
| `tables` | No | None | Specific tables to profile |
| `exclude_tables` | No | None | Tables to skip (glob patterns) |
| `fail_on_findings` | No | `True` | Fail task if problems found |

## What it does

1. Reads connection from your Airflow connection
2. Profiles all tables (null rates, uniqueness, patterns, FKs, freshness)
3. Compares against previous profile (drift detection)
4. Fails the task if findings exceed your severity threshold
5. Pushes results to XCom for downstream use

## XCom output

```python
{
    "findings_count": 3,
    "total_findings": 3,
    "passed_count": 47,
    "tables_profiled": 6,
    "findings": [
        {"table": "public.orders", "column": "email", "severity": "HIGH", "message": "..."},
    ]
}
```

## Supported databases

PostgreSQL, DuckDB, SQLite, MySQL (via Airflow connection types).

## License

MIT
