Metadata-Version: 2.4
Name: dbt-guard
Version: 0.1.2
Summary: Column-level lineage breaking change detection for dbt Core CI pipelines
Project-URL: Homepage, https://github.com/damione1/dbt-guard
Project-URL: Issues, https://github.com/damione1/dbt-guard/issues
License: Apache License
        Version 2.0, January 2004
        http://www.apache.org/licenses/
        
        TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
        1. Definitions.
        
        "License" shall mean the terms and conditions for use, reproduction,
        and distribution as defined by Sections 1 through 9 of this document.
        
        "Licensor" shall mean the copyright owner or entity authorized by
        the copyright owner that is granting the License.
        
        "Legal Entity" shall mean the union of the acting entity and all
        other entities that control, are controlled by, or are under common
        control with that entity. For the purposes of this definition,
        "control" means (i) the power, direct or indirect, to cause the
        direction or management of such entity, whether by contract or
        otherwise, or (ii) ownership of fifty percent (50%) or more of the
        outstanding shares, or (iii) beneficial ownership of such entity.
        
        "You" (or "Your") shall mean an individual or Legal Entity
        exercising permissions granted by this License.
        
        "Source" form shall mean the preferred form for making modifications,
        including but not limited to software source code, documentation
        source, and configuration and application files.
        
        "Object" form shall mean any form resulting from mechanical
        transformation or translation of a Source form, including but
        not limited to compiled object code, generated documentation,
        and conversions to other media types.
        
        "Work" shall mean the work of authorship made available under
        the License, as indicated by a copyright notice that is included in
        or attached to the work (an example is provided in the Appendix below).
        
        "Derivative Works" shall mean any work, whether in Source or Object
        form, that is based on (or derived from) the Work and for which the
        editorial revisions, annotations, elaborations, or other modifications
        represent, as a whole, an original work of authorship. For the purposes
        of this License, Derivative Works shall not include works that remain
        separable from, or merely link (or bind by name) to the interfaces of,
        the Work and Derivative Works thereof.
        
        "Contribution" shall mean, as submitted to the Licensor for inclusion
        in the Work by the copyright owner or by an individual or Legal Entity
        authorized to submit on behalf of the copyright owner. For the purposes
        of this definition, "submitted" means any form of electronic, verbal,
        or written communication sent to the Licensor or its representatives,
        including but not limited to communication on electronic mailing lists,
        source code control systems, and issue tracking systems that are managed
        by, or on behalf of, the Licensor for the purpose of discussing and
        improving the Work, but excluding communication that is conspicuously
        marked or designated in writing by the copyright owner as "Not a
        Contribution."
        
        "Contributor" shall mean Licensor and any Legal Entity on behalf of
        whom a Contribution has been received by the Licensor and included
        within the Work.
        
        2. Grant of Copyright License. Subject to the terms and conditions of
        this License, each Contributor hereby grants to You a perpetual,
        worldwide, non-exclusive, no-charge, royalty-free, irrevocable
        copyright license to reproduce, prepare Derivative Works of,
        publicly display, publicly perform, sublicense, and distribute the
        Work and such Derivative Works in Source or Object form.
        
        3. Grant of Patent License. Subject to the terms and conditions of
        this License, each Contributor hereby grants to You a perpetual,
        worldwide, non-exclusive, no-charge, royalty-free, irrevocable
        (except as stated in this section) patent license to make, have made,
        use, offer to sell, sell, import, and otherwise transfer the Work,
        where such license applies only to those patent claims licensable
        by such Contributor that are necessarily infringed by their
        Contribution(s) alone or by the combination of their Contribution(s)
        with the Work to which such Contribution(s) was submitted. If You
        institute patent litigation against any entity (including a cross-claim
        or counterclaim in a lawsuit) alleging that the Work or any
        Contribution embodied within the Work constitutes direct or contributory
        patent infringement, then any patent licenses granted to You under
        this License for that Work shall terminate as of the date such
        litigation is filed.
        
        4. Redistribution. You may reproduce and distribute copies of the
        Work or Derivative Works thereof in any medium, with or without
        modifications, and in Source or Object form, provided that You
        meet the following conditions:
        
        (a) You must give any other recipients of the Work or Derivative Works
            a copy of this License; and
        
        (b) You must cause any modified files to carry prominent notices
            stating that You changed the files; and
        
        (c) You must retain, in the Source form of any Derivative Works
            that You distribute, all copyright, patent, trademark, and
            attribution notices from the Source form of the Work,
            excluding those notices that do not pertain to any part of
            the Derivative Works; and
        
        (d) If the Work includes a "NOTICE" text file as part of its
            distribution, You must include a readable copy of the
            attribution notices contained within such NOTICE file, in
            at least one of the following places: within a NOTICE text file
            distributed as part of the Derivative Works; within the Source
            form or documentation, if provided along with the Derivative
            Works; or, within a display generated by the Derivative Works,
            if and wherever such third-party notices normally appear. The
            contents of the NOTICE file are for informational purposes only
            and do not modify the License. You may add Your own attribution
            notices within Derivative Works that You distribute, alongside
            or as an addendum to the NOTICE text from the Work, provided
            that such additional attribution notices cannot be construed
            as modifying the License.
        
        You may add Your own license statement for Your modifications and
        may provide additional grant of rights to use, copy, modify, merge,
        publish, distribute, sublicense, and/or sell copies of the Work,
        and to permit persons to whom the Work is furnished to do so, subject
        to the terms and conditions of this License.
        
        5. Submission of Contributions. Unless You explicitly state otherwise,
        any Contribution intentionally submitted for inclusion in the Work
        by You to the Licensor shall be under the terms and conditions of
        this License, without any additional terms or conditions.
        Notwithstanding the above, nothing herein shall supersede or modify
        the terms of any separate license agreement you may have executed
        with Licensor regarding such Contributions.
        
        6. Trademarks. This License does not grant permission to use the trade
        names, trademarks, service marks, or product names of the Licensor,
        except as required for reasonable and customary use in describing the
        origin of the Work and reproducing the content of the NOTICE file.
        
        7. Disclaimer of Warranty. Unless required by applicable law or
        agreed to in writing, Licensor provides the Work (and each
        Contributor provides its Contributions) on an "AS IS" BASIS,
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
        implied, including, without limitation, any warranties or conditions
        of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
        PARTICULAR PURPOSE. You are solely responsible for determining the
        appropriateness of using or reproducing the Work and assume any
        risks associated with Your exercise of permissions under this License.
        
        8. Limitation of Liability. In no event and under no legal theory,
        whether in tort (including negligence), contract, or otherwise,
        unless required by applicable law (such as deliberate and grossly
        negligent acts) or agreed to in writing, shall any Contributor be
        liable to You for damages, including any direct, indirect, special,
        incidental, or exemplary damages of any character arising as a
        result of this License or out of the use or inability to use the
        Work (including but not limited to damages for loss of goodwill,
        work stoppage, computer failure or malfunction, or all other
        commercial damages or losses), even if such Contributor has been
        advised of the possibility of such damages.
        
        9. Accepting Warranty or Additional Liability. While redistributing
        the Work or Derivative Works thereof, You may choose to offer,
        and charge a fee for, acceptance of support, warranty, indemnity,
        or other liability obligations and/or rights consistent with this
        License. However, in accepting such obligations, You may offer only
        conditions consistent with this License.
        
        END OF TERMS AND CONDITIONS
License-File: LICENSE
Keywords: breaking-change,ci,column-lineage,data,dbt,lineage
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.9
Requires-Dist: click>=8.0.0
Requires-Dist: sqlglot>=25.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# dbt-guard

[![PyPI](https://img.shields.io/pypi/v/dbt-guard)](https://pypi.org/project/dbt-guard/)
[![CI](https://github.com/damione1/dbt-guard/actions/workflows/ci.yml/badge.svg)](https://github.com/damione1/dbt-guard/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)

Column-level lineage breaking change detection for dbt Core CI pipelines.

dbt-guard detects when a model's output columns change in a way that would break downstream consumers — before the code reaches production. It works by comparing two `manifest.json` files (the base branch vs. the PR branch) using static analysis only: no database connection required.

This tool addresses the gap described in [dbt-core issue #6869](https://github.com/dbt-labs/dbt-core/issues/6869): dbt has no built-in mechanism for blocking PRs that silently remove or rename columns that downstream models depend on.

## Quick start

```bash
pip install dbt-guard
```

Then in your CI pipeline, after running `dbt parse` on both the base branch and the PR branch:

```bash
dbt-guard diff \
  --base path/to/base/target/ \
  --current path/to/current/target/ \
  --dialect snowflake \
  --format github
```

Exit code 0 means no breaking changes. Exit code 1 means breaking changes were detected.

## GitHub Actions integration

```yaml
- name: Generate base manifest
  run: |
    git stash
    dbt parse --profiles-dir . --target ci
    mkdir -p /tmp/base_target && cp target/manifest.json /tmp/base_target/
    git stash pop

- name: Generate current manifest
  run: dbt parse --profiles-dir . --target ci

- name: Column lineage check
  run: |
    dbt-guard diff \
      --base /tmp/base_target \
      --current target/ \
      --dialect snowflake \
      --format github
```

## Bitbucket Pipelines integration

```yaml
pipelines:
  pull-requests:
    '**':
      - step:
          name: Column lineage check
          image: python:3.12-slim
          script:
            - pip install dbt-guard dbt-core dbt-snowflake
            - git fetch origin $BITBUCKET_PR_DESTINATION_BRANCH
            - git stash
            - dbt parse --profiles-dir . --target ci
            - mkdir -p /tmp/base_target && cp target/manifest.json /tmp/base_target/
            - git stash pop
            - dbt parse --profiles-dir . --target ci
            - dbt-guard diff --base /tmp/base_target --current target/ --dialect snowflake
```

## CLI reference

```
dbt-guard diff [OPTIONS]

Options:
  --base PATH           Directory containing base manifest.json  [required]
  --current PATH        Directory containing current manifest.json  [required]
  --dialect TEXT        SQL dialect: default, snowflake, bigquery, databricks,
                        redshift, trino  [default: default]
  --format TEXT         Output format: text, json, github  [default: text]
  --fail-on TEXT        When to exit non-zero: breaking, any, never
                        [default: breaking]
  --no-impact           Skip downstream impact analysis
  --max-depth INT       Max DAG hops for impact traversal  [default: 10]
  --output PATH         Write report to file instead of stdout
  --select MODEL        Limit diff to specific model names (repeatable)
  --quiet               Print one-line summary only
  --version             Show version and exit
  --help                Show this message and exit
```

### Exit codes

| Code | Meaning |
|------|---------|
| 0 | No breaking changes (or --fail-on never) |
| 1 | Breaking changes detected (or any changes with --fail-on any) |
| 2 | Tool error (manifest not found, invalid JSON, etc.) |

## How it works

1. **Parse both manifests.** dbt-guard reads `manifest.json` from the base and current target directories. No dbt execution, no database connection.

2. **Extract column inventories.** For each model, it reads the documented columns from `manifest.json`. If compiled SQL is present on disk (in `target/compiled/`), it additionally parses the SQL with [SQLGlot](https://github.com/tobymao/sqlglot) to detect undocumented columns.

3. **Diff columns.** For each model present in both manifests, it compares column sets:
   - Column removed → breaking
   - Column renamed (1 removed + 1 added, matching type) → breaking
   - Column type changed (only when documented on both sides) → breaking
   - Column added → non-breaking

4. **Impact analysis.** For each breaking change, it traverses the `child_map` in the manifest via BFS to find downstream models affected transitively.

5. **Report.** Output in text, JSON, or GitHub Actions annotation format.

## What counts as breaking vs. non-breaking

Breaking changes will cause downstream consumers to fail or produce incorrect results:

| Change | Breaking? | Why |
|--------|-----------|-----|
| Column removed | Yes | Downstream SELECT or JOIN on that column will fail |
| Column renamed | Yes | All references to the old name break |
| Column type changed | Yes | Implicit casts may fail or produce wrong results |
| Column added | No | Additive; downstream consumers are unaffected |
| New model added | No | Nothing depends on it yet |
| Model removed from current | No | Not diffed; dbt will surface this as a ref() error |

## Limitations

**SELECT * expansion.** When a model ends with `SELECT * FROM final_cte`, dbt-guard tries to resolve the star by tracing back through the CTE chain. If the star references a physical table (not a CTE), expansion fails and dbt-guard falls back to documented columns from schema.yml.

**No catalog required.** dbt-guard does not need `catalog.json` (the output of `dbt docs generate`). Column types are taken from schema.yml documentation when available. Type-change detection only fires when both the base and current sides have a documented `data_type`. Models where columns are entirely undocumented are still diffed by column name (removal/addition), just not by type.

**Parse-only manifests.** `dbt parse` does not compile SQL (compiled files are absent). In this mode, dbt-guard works exclusively from documented columns. Run `dbt compile` instead of `dbt parse` to enable SQL-based column extraction.

**Rename heuristic.** The rename detection (1 removed + 1 added with matching type) is a best-effort heuristic. If a model removes one column and adds a different one in the same PR, dbt-guard will report it as a rename. Use `--format json` to inspect the raw events.

**Column ordering.** dbt-guard does not detect column reordering. Changing the position of a column in a SELECT is non-breaking for named references but breaking for positional references (e.g. `SELECT * FROM upstream` in the middle of a CTE). This is a known gap.

## Contributing

Contributions are welcome. The project uses standard Python tooling:

```bash
# Clone and install in editable mode with dev dependencies
git clone https://github.com/dbt-guard/dbt-guard
cd dbt-guard
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check dbt_guard/

# Type check
mypy dbt_guard/
```

The test suite uses synthetic manifest fixtures in `tests/fixtures/manifests/`. To add a new test scenario, add a manifest pair there and write the corresponding test.

Key design decisions:
- Minimal dependencies: only `sqlglot` and `click`. No pandas, no dbt-core.
- Graceful degradation: if SQL parsing fails, fall back to documented columns rather than raising.
- Static analysis only: no database connection, no `dbt run` needed.

## License

Apache 2.0. See [LICENSE](LICENSE).
