Metadata-Version: 2.4
Name: airflow-dag-audit
Version: 0.1.0
Summary: Static budget checks and hash stability checks for Apache Airflow DAG files.
Project-URL: Homepage, https://github.com/example/airflow-dag-audit
Project-URL: Repository, https://github.com/example/airflow-dag-audit
Project-URL: Issues, https://github.com/example/airflow-dag-audit/issues
Author: Konrad Rymczak
License-Expression: MIT
License-File: LICENSE
Keywords: airflow,audit,cli,dag,pytest
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Requires-Dist: rich>=13.9.0
Requires-Dist: tomli>=2.0.1; python_version < '3.11'
Requires-Dist: typer>=0.16.0
Provides-Extra: airflow
Requires-Dist: apache-airflow>=2.8; extra == 'airflow'
Description-Content-Type: text/markdown

# airflow-dag-audit

`airflow-dag-audit` provides static checks for DAG files and an optional reparse hash check.
It is designed for local development, `pytest`, and command-line use.

The package does not start scheduler, triggerer, webserver, or database services. It focuses on:

- AST-based counts such as imports, `Variable.get(...)` calls, SQL-like string literals, and top-level calls
- a hash stability check that reparses the same DAG file twice
- a pytest plugin for project defaults and per-test overrides
- a CLI that can be installed as a tool and executed with `uvx`

## Trademark notice

Apache Airflow, Apache, and related marks belong to The Apache Software Foundation.
This project is not affiliated with, endorsed by, or sponsored by The Apache Software Foundation.
It is a third-party helper package for DAG repositories.

## Installation

### Library dependency

```bash
uv add airflow-dag-audit
```

If the environment that runs the hash check does not already have Apache Airflow installed, you can install the optional extra:

```bash
uv add 'airflow-dag-audit[airflow]'
```

### CLI tool

After publishing to PyPI, the CLI can be executed without creating a project environment:

```bash
uvx airflow-dag-audit --help
```

## What is checked

### Static metrics

The AST analysis currently reports:

- `import_count`
- `variable_get_count` for `Variable.get(...)`
- `sql_query_count` for string literals that look like SQL
- `top_level_call_count`
- `detected_dag_decorators`

### Stable hash

With `require_stable_hash=True`, the package reparses a DAG file twice and compares canonical serialized payloads.

- If Apache Airflow is importable, the worker tries to serialize matching DAG objects with `SerializedDAG.to_dict(...)`.
- Otherwise, it falls back to a generic serializer for DAG-like Python objects.

The check is useful for detecting DAG definitions that mutate during import or serialization.

## Python API

### Basic assertion

```python
from pathlib import Path

from airflow_dag_audit import DagAuditConfig, assert_dag_budget

config = DagAuditConfig(
    max_imports=20,
    max_variable_gets=2,
    max_sql_queries=3,
    max_top_level_calls=10,
    require_stable_hash=True,
)

assert_dag_budget(Path("dags/example_dag.py"), config=config)
```

### Non-raising inspection

```python
from airflow_dag_audit import DagAuditConfig, audit_dag_file

result = audit_dag_file(
    "dags/example_dag.py",
    config=DagAuditConfig(max_imports=20, require_stable_hash=True),
)

print(result.ok)
print(result.metrics.as_dict())
if result.hash_result:
    print(result.hash_result.first_hashes)
```

## Pytest usage

The package exposes a pytest plugin through the `pytest11` entry point.

### Project defaults in `pyproject.toml`

The package supports layered defaults in `[tool.airflow-dag-audit]`.
Use `[tool.airflow-dag-audit.budget]` for global limits and `[[tool.airflow-dag-audit.overrides]]`
for per-glob or per-file exceptions. Later matching overrides win.

```toml
[tool.airflow-dag-audit]
dag_folder = "dags"
include = ["**/*.py"]
exclude = ["**/__pycache__/**", "**/tests/**"]
check_hash_stability = true
hash_parse_repeats = 2

[tool.airflow-dag-audit.budget]
imports = 40
import_froms = 25
variable_get_calls = 10
connection_get_calls = 6
airflow_query_calls = 8
operators = 80
tasks = 150

[[tool.airflow-dag-audit.overrides]]
match = "dags/legacy/*.py"

[tool.airflow-dag-audit.overrides.budget]
imports = 80
variable_get_calls = 25
connection_get_calls = 20
airflow_query_calls = 20
tasks = 300

[[tool.airflow-dag-audit.overrides]]
match = "dags/legacy/specific_bad_but_known.py"
check_hash_stability = false

[tool.airflow-dag-audit.overrides.budget]
imports = 160
import_froms = 90
variable_get_calls = 40
connection_get_calls = 35
airflow_query_calls = 30
operators = 250
tasks = 700
```

If `files` is omitted, the package scans `dag_folder` using `include` and `exclude`.


```toml
[tool.pytest.ini_options]
airflow_dag_audit_dag_folder = "dags"
airflow_dag_audit_dag_files = [
  "dags/example_good.py",
  "dags/example_unstable.py",
]
airflow_dag_audit_max_imports = "20"
airflow_dag_audit_max_variable_gets = "2"
airflow_dag_audit_max_sql_queries = "3"
airflow_dag_audit_max_top_level_calls = "10"
airflow_dag_audit_require_stable_hash = "true"
```

### Test code

```python
from airflow_dag_audit import assert_dag_budget


def test_dag_budget(dag_file, dag_audit_config) -> None:
    assert_dag_budget(dag_file, config=dag_audit_config)
```

### Per-test overrides

```python
import pytest

from airflow_dag_audit import assert_dag_budget


@pytest.mark.airflow_dag_budget(max_imports=8, require_stable_hash=False)
def test_small_dag(dag_file, dag_audit_config) -> None:
    assert_dag_budget(dag_file, config=dag_audit_config)
```

### Command-line overrides

```bash
uv run pytest \
  --airflow-dag-file dags/example_good.py \
  --airflow-dag-folder dags \
  --airflow-dag-max-imports 20 \
  --airflow-dag-max-variable-gets 2 \
  --airflow-dag-max-sql-queries 3 \
  --airflow-dag-max-top-level-calls 10 \
  --airflow-dag-require-stable-hash
```

### Why `--airflow-dag-folder` exists

When you point to a single DAG file, the package tries to infer a useful DAG folder automatically.
That works for common layouts, especially when the file lives under a `dags/` directory or inside a Python package.

If the file relies on sibling modules, package-relative imports, or a non-standard repository layout, pass `--airflow-dag-folder` explicitly.
The same applies to `DagAuditConfig(dag_folder=...)` in Python code.

## CLI usage

### Scan without failing the process

```bash
uvx airflow-dag-audit scan dags \
  --max-imports 20 \
  --max-variable-gets 2 \
  --max-sql-queries 3 \
  --max-top-level-calls 10
```

### Fail on budget violations

```bash
uvx airflow-dag-audit assert dags \
  --max-imports 20 \
  --max-variable-gets 2 \
  --max-sql-queries 3 \
  --max-top-level-calls 10 \
  --require-stable-hash
```

### Check only the reparse hash

```bash
uvx airflow-dag-audit hash dags/example_unstable.py --dag-folder dags --show-diff
```

### JSON output

```bash
uvx airflow-dag-audit scan dags --json
```

### Use only `pyproject.toml`

```bash
uvx airflow-dag-audit assert
```

## Development

### Install dependencies

```bash
uv sync --group dev
```

### Run tests

```bash
uv run pytest
```

### Build distributions

```bash
uv run python -m build
```

## Publishing from GitHub tags

The repository includes two workflows:

- `.github/workflows/ci.yml` for tests and package build
- `.github/workflows/publish.yml` for PyPI publishing on version tags

The publishing workflow is written for PyPI Trusted Publishing.
See the section after the ZIP artifact in the chat response for the PyPI and GitHub configuration steps.

## Examples

The `examples/` directory contains:

- a stable DAG-like file
- an unstable DAG-like file that changes hash across reparses
- a pytest example
