Metadata-Version: 2.4
Name: dbt-semguard
Version: 0.5.3
Summary: Catch semantic breaking changes in dbt metrics before they land in production.
Project-URL: Repository, https://github.com/yeaight7/dbt-semguard
Project-URL: Issues, https://github.com/yeaight7/dbt-semguard/issues
Project-URL: Changelog, https://github.com/yeaight7/dbt-semguard/blob/main/CHANGELOG.md
Project-URL: Documentation, https://github.com/yeaight7/dbt-semguard#readme
Author-email: yeaight7 <rivero4javier@outlook.es>
License: MIT
License-File: LICENSE
Keywords: data-quality,dbt,github-actions,metrics,semantic-layer
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.11
Requires-Dist: pyyaml<7,>=6.0
Provides-Extra: dev
Requires-Dist: pytest<9,>=8.2; extra == 'dev'
Description-Content-Type: text/markdown

# dbt-semguard

Catch semantic breaking changes in dbt metrics before they land in production.

`dbt-semguard` is a CLI-first semantic change detector for dbt Semantic Layer definitions. It compares two versions of the semantic contract, classifies changes as `breaking`, `risky`, or `safe`, and renders local or GitHub-friendly output without requiring warehouse access or dbt runtime internals.

## What Is This For?

`dbt-semguard` is a semantic PR guard for dbt metrics and semantic models.

It answers one question:

> What changed in the meaning of this metric?

That matters because many dbt changes are valid from a parser or build point of view, but still dangerous for downstream consumers.

For example, a PR may:

- change `gross_revenue` from `sum(order_total)` to `avg(order_total)`
- remove a dimension people use to slice a KPI
- change a ratio metric denominator
- widen or narrow a metric filter
- change entity or time-grain semantics

In all of those cases, dbt may still parse successfully and CI may still be green. But the business meaning of the metric has changed, and dashboards, notebooks, reverse ETL jobs, or APIs may silently start returning different answers.

`dbt-semguard` exists to catch that class of change before it reaches production.

## What It Does Exactly

`dbt-semguard` does not lint YAML style and it does not validate warehouse execution.

Instead, it:

1. reads the dbt Semantic Layer definition from two inputs
2. extracts only the semantic parts that affect meaning
3. builds a canonical contract for each side
4. diffs those contracts
5. classifies each change as `breaking`, `risky`, or `safe`
6. renders the result for local CLI use or GitHub Actions

In practical terms, it helps teams review semantic changes the same way they already review code changes.

## How It Works

The tool reduces dbt semantic definitions into a normalized contract that is easier to compare than raw YAML.

It keeps fields that affect meaning, such as:

- semantic model identity
- backing model name
- entities and entity types
- dimensions and time granularity
- metric type
- aggregation and expression
- filters
- ratio numerator and denominator

It intentionally ignores noise such as:

- descriptions
- docs blocks
- YAML ordering
- whitespace and comments

That means the output is focused on semantic drift, not formatting drift.

[//]: # (## How To Explain It To A Data Team)

[//]: # (Short version:)

[//]: # (> `dbt-semguard` tells you whether a PR changes the meaning of a metric, not just its code.)

[//]: # (Slightly longer version:)

[//]: # (> It compares the dbt Semantic Layer before and after a PR, strips away cosmetic YAML changes, and highlights only the changes that can affect how downstream users interpret or query a KPI.)

## Install From PyPI

```bash
python -m pip install dbt-semguard
```

`dbt-semguard` requires Python 3.11 or newer.

## Install From GitHub

```bash
python -m pip install "git+https://github.com/yeaight7/dbt-semguard.git@v0.5.3"
```

Use the GitHub install path when you need to pin directly to a repository tag.

## Install From Source

```bash
git clone https://github.com/yeaight7/dbt-semguard.git
cd dbt-semguard
python -m pip install .
```

## How To Use It

### Run locally before opening a PR

Use this when you want to sanity-check semantic changes while you are still developing:

```bash
semguard diff --base-ref main --head-ref HEAD --project-dir .
semguard check --base-ref main --head-ref HEAD --project-dir . --fail-on breaking
```

Typical use:

- `diff` when you want to inspect what changed
- `check` when you want a blocking exit code for automation or local scripts

For monorepos, always point `--project-dir` at the dbt project root you want to analyze:

```bash
semguard diff --base-ref main --head-ref HEAD --project-dir analytics/dbt
```

Git ref mode and local YAML mode now both scope discovery to this directory.

### Compare exported contracts directly

Use this when you want to compare two precomputed semantic contracts:

```bash
semguard diff --base-contract base-contract.json --head-contract head-contract.json --format markdown
```

### Compare manifests explicitly

Use this when your workflow already has dbt `semantic_manifest.json` artifacts available:

```bash
semguard diff --base-manifest base-semantic-manifest.json --head-manifest head-semantic-manifest.json --format json
```

### Extract a contract

Use this when you want a stable machine-readable snapshot of semantic meaning:

```bash
semguard extract --source yaml --project-dir examples/ecommerce_dbt_project --output base-contract.json
semguard extract --source manifest --manifest semantic_manifest.json --output manifest-contract.json
```

### Configure YAML discovery with `.semguard.yml`

Create `.semguard.yml` in your dbt project root to control which YAML files are scanned:

```yaml
include:
  - models/**/*.yml
  - models/**/*.yaml
  - metrics/**/*.yml
  - metrics/**/*.yaml
  - semantic_models/**/*.yml
  - semantic_models/**/*.yaml
exclude:
  - target/**
  - dbt_packages/**
  - .venv/**
  - .github/**
```
If the file is not present, these defaults are applied automatically.


## Example Review Flow

1. A developer changes a metric or semantic model in dbt.
2. `dbt-semguard diff` compares the base branch and the current branch.
3. The tool reports semantic changes only.
4. The team decides whether the change is acceptable, needs migration planning, or should be blocked.
5. In CI, `semguard check --fail-on breaking` can fail the PR automatically.

## How To Read The Result

- `breaking`: the semantic meaning changed in a way that should usually block by default
- `risky`: the change may be legitimate, but downstream consumers should review it
- `safe`: cosmetic-only changes that do not appear in the semantic diff

## Output

`diff` and `check` emit one of:

- `text`
- `markdown`
- `json`

JSON reports contain:

- `summary`
- `highest_severity`
- `blocking`
- `changes`
- `metadata`

### Example Markdown report

```md
## dbt-semguard report

### Breaking changes
#### Metric `gross_revenue`
- Metric `gross_revenue` changed aggregation from `sum` to `avg`.

Status: blocking
```

### Example JSON report

```json
{
  "summary": {
    "breaking": 3,
    "risky": 1,
    "safe": 0
  },
  "highest_severity": "breaking",
  "blocking": true
}
```

## Coverage

`dbt-semguard` currently covers the highest-value semantic changes in the latest dbt Semantic Layer spec.

Covered extractors and inputs:

- Latest-spec YAML projects
- Legacy top-level `semantic_models` / `metrics` YAML projects
- Explicit dbt `semantic_manifest.json` input
- Canonical contract JSON emitted by `semguard extract`

Covered semantic comparisons:

- Semantic model add/remove and backing model changes
- Semantic model default aggregation time dimension changes
- Entity add/remove, type changes, and expression changes
- Dimension add/remove, type changes, expression changes, and time granularity changes
- Simple metric aggregation, expression, label, filter, ownership, aggregation-time, and non-additive changes
- Ratio metric numerator and denominator changes
- Derived metric expression and input metric changes
- Cumulative metric input, window, grain-to-date, and period-aggregation changes
- Conversion metric entity, calculation, base metric, conversion metric, and constant-property changes
- Additive changes such as new entities, new dimensions, and new metrics

Current automated coverage:

- YAML extraction for the latest spec
- Manifest normalization
- Semantic diff severity mapping for breaking and risky changes
- Declarative field-coverage policy so contract fields are explicitly diffed, nested, or intentionally excluded
- Source diagnostics in extracted YAML contracts and change reports
- CLI `extract`, `diff`, and `check`
- Sticky PR comment delivery through the GitHub Action
- Checkout-free git ref mode
- Pre-release local action smoke coverage in CI, plus post-release published action smoke coverage in both git-ref and manifest modes, including spaced manifest paths

## Current Limitations

Known `v0.5.3` limitations are intentionally narrow:

- There is no allowlist for intentional semantic changes yet.
- Manifest parsing expects dbt `semantic_manifest.json`, not the general-purpose dbt `manifest.json` artifact.
- Legacy YAML support covers top-level `semantic_models`, `measures`, and `type_params`, but cross-project ref semantics are still normalized conservatively into the single `model_name` contract field.
- Rename handling is intentionally conservative: a rename is treated as a removal plus an addition.
- Source diagnostics are best-effort and currently strongest for YAML extraction; manifest-derived contracts may still lack file/line detail.
- GitHub integration supports sticky PR comments and inline annotations for pull_request workflows, but does not yet manage review-thread lifecycles.

## Use As A GitHub Action

Use the included composite action from this repository:

```yaml
jobs:
  semguard:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      issues: write
      pull-requests: read
      checks: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: yeaight7/dbt-semguard@v0.5.3
        id: semguard
        with:
          base-ref: ${{ github.event.pull_request.base.sha }}
          head-ref: ${{ github.sha }}
          fail-on: breaking
          pr-comment: true
          pr-comment-mode: sticky
          github-token: ${{ github.token }}

      - name: Inspect semguard outputs
        run: |
          echo "Highest severity: ${{ steps.semguard.outputs.highest-severity }}"
          echo "Blocking: ${{ steps.semguard.outputs.blocking }}"
```

The action now exposes structured outputs so downstream CI can branch on semantic severity without reparsing JSON:

- `steps.semguard.outputs.highest-severity`
- `steps.semguard.outputs.blocking`
- `steps.semguard.outputs.breaking-count`
- `steps.semguard.outputs.risky-count`
- `steps.semguard.outputs.safe-count`

`pr-comment-mode` accepts:

- `sticky`: update the previous dbt-semguard PR comment when one already exists
- `create`: always publish a new PR comment instead of updating the previous one

The action writes:

- a Markdown summary to the workflow summary
- a JSON artifact named `semguard-report`
- structured step outputs for severity and counts
- an optional sticky PR comment when `pr-comment: true`
- inline check-run annotations when source diagnostics are available
- a failing status when the configured threshold is reached

When there are zero semantic changes, the Markdown artifact and workflow summary explicitly include `No semantic changes detected.` followed by `Status: passing`.

This is the recommended setup when you want the semantic review to happen automatically on every PR.

If you enable `pr-comment: true`, the workflow needs:

- `contents: read`
- `issues: write`
- `pull-requests: read`
- `checks: write`

Missing `checks: write` can prevent inline annotations and check runs from appearing even when the semantic diff succeeds.

For forked pull requests, the standard `pull_request` event usually does not get a write-capable `GITHUB_TOKEN`, so sticky PR comments and check-run annotations may be unavailable unless you adopt a separate trusted workflow pattern.

## Troubleshooting

Common CI and configuration issues are covered in [docs/troubleshooting.md](docs/troubleshooting.md).

## Migration notes (`v0.5.3`)

- Git ref extraction now scopes strictly to `--project-dir` for monorepos.
- YAML discovery now uses safe default include/exclude patterns.
- Optional `.semguard.yml` include/exclude rules are applied in both local and git-ref YAML extraction.
- Invalid semantic YAML now raises user-facing errors with source context instead of raw `KeyError` tracebacks.
- Composite action shell steps now read user-controlled values from environment variables instead of embedding GitHub expressions directly in Bash.
- Composite action now generates JSON, Markdown, summary text, and step outputs in a single pass before enforcing the blocking threshold.
- Composite action report files now live in an isolated runner temp directory derived from `artifact-name`, which avoids workspace filename collisions in matrix-style CI jobs.
- The repository now documents security reporting, contribution setup, and common action troubleshooting paths.

## Example project

An example latest-spec dbt project lives in [examples/ecommerce_dbt_project](examples/ecommerce_dbt_project).

## Documentation

- [Contract spec](docs/contract-spec.md)
- [How to use and explain dbt-semguard](docs/how-to-use.md)
- [Severity rules](docs/severity-rules.md)
- [Troubleshooting](docs/troubleshooting.md)
- [Roadmap](docs/roadmap.md)
- [Changelog](CHANGELOG.md)
- [Contributing](CONTRIBUTING.md)
- [Security policy](SECURITY.md)

## License

This project is open source under the MIT License. See [LICENSE](LICENSE).
