Metadata-Version: 2.4
Name: semantic-view-guard
Version: 0.1.0
Summary: Lint and score Snowflake semantic models (native SEMANTIC VIEW DDL and Cortex Analyst YAML) for correctness and AI-readiness.
Project-URL: Homepage, https://github.com/KarthikRajashekaran/semantic-view-guard
Project-URL: Issues, https://github.com/KarthikRajashekaran/semantic-view-guard/issues
Author: Karthik Rajashekaran
License: Apache-2.0
License-File: LICENSE
Keywords: ai-readiness,cortex,data-quality,dbt,linter,semantic-layer,semantic-view,snowflake
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.10
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# semantic-view-guard

> Lint and score **Snowflake semantic models** — native `SEMANTIC VIEW` DDL *and* Cortex Analyst YAML — for correctness and **AI-readiness**. Warehouse-free. Works in GitHub **and** GitLab CI.

`svguard` is a CLI + CI integration that treats your semantic layer like code: it parses your semantic views, runs lint rules (missing comments, missing synonyms, orphan relationships, tables without primary keys), and produces a transparent **AI-readiness score (0–100)** so a Cortex agent gets clean, well-described, joinable metadata.

It fills a gap in the ecosystem:

| Tool | What it governs |
|---|---|
| [dbt-semguard](https://github.com/yeaight7/dbt-semguard) | dbt **MetricFlow** semantic layer YAML |
| [dbt-sf-ai](https://github.com/ian-andriot/dbt-sf-ai) | **Creates** Snowflake AI objects (doesn't govern them) |
| **semantic-view-guard** | **Snowflake `SEMANTIC VIEW` DDL + Cortex Analyst YAML** — correctness + AI-readiness |

## Why

Snowflake semantic views power Cortex Analyst / Agents. Their quality (comments, synonyms, keys, relationships) directly determines how well an agent answers natural-language questions — but nothing lints them today. `svguard` makes that quality visible and enforceable in CI.

## Install

```bash
pip install semantic-view-guard
```

## Quick start

```bash
# From a dbt manifest (reads materialized='semantic_view' nodes)
svguard lint --manifest target/manifest.json

# Zero dbt / zero warehouse — parse raw .sql model files directly
svguard lint --sql "models/**/semantic_views/**/*.sql"

# Cortex Analyst YAML semantic models
svguard lint --analyst-yaml "semantic_models/**/*.yml"

# Mix sources, gate on score, emit JSON
svguard lint --manifest target/manifest.json \
             --analyst-yaml "semantic_models/**/*.yml" \
             --min-score 80 --reporter json --output svguard.json
```

Exit code is non-zero if any rule set to `error` fires, or any view falls below `--min-score`.

## Example output

```
## semantic-view-guard

**FAILED** — 13 view(s), 4 error(s), 0 warning(s)

| Semantic view | Source | Score | Errors | Warnings | Status |
|---|---|---:|---:|---:|---|
| `sv_fact_arr`             | ddl | 100 | 0 | 0 | PASS |
| `sv_fact_avatax_txn`      | ddl | 100 | 1 | 0 | FAIL |
| `sv_hex_connections`      | ddl | 100 | 0 | 0 | PASS |
```

## How it works

```
 SEMANTIC VIEW DDL  ──►  DDL parser   ─┐
 (manifest / .sql)                     ├─►  Canonical model ──► Rules engine ──► Findings + Score ──► Reporter
 Cortex Analyst YAML ─►  YAML parser  ─┘     (tables, columns,    (source-aware)      (markdown │ json │ gitlab)
                                              comments, synonyms,
                                              relationships, keys)
```

The DDL parser is a tolerant, hand-rolled section parser (no warehouse, no full SQL grammar): it splits a body into `TABLES / RELATIONSHIPS / FACTS / DIMENSIONS / METRICS / COMMENT` clauses and extracts `PRIMARY KEY`, `WITH SYNONYMS`, and `COMMENT`. Unparseable fragments become an `SV000` parse-warning rather than a crash.

## Rules

| Code | Category | Default | Checks |
|---|---|---|---|
| `SV001` | structural | error | table missing `PRIMARY KEY` |
| `SV002` | structural | error | relationship references a table not in `TABLES(...)` |
| `SV003` | structural | error | relationship join-key arity mismatch |
| `SV004` | structural | error | column references an undeclared table |
| `SV010` | ai-readiness | warn | table/column missing `COMMENT` |
| `SV011` | ai-readiness | warn | column missing `WITH SYNONYMS` |
| `SV012` | ai-readiness | warn | low synonym coverage across the view |
| `SV013` | ai-readiness | warn | comment too short / echoes the column name |
| `SV014` | ai-readiness | warn | view missing top-level `COMMENT` |
| `SV020` | governance | off | column name looks like PII |
| `SV022` | governance | off | DDL missing `COPY GRANTS` |
| `SV000` | parser | warn | a fragment could not be parsed |

## AI-readiness score

Transparent weighted coverage (configurable), shown with sub-scores:

```
sv_fact_avatax_txn — 100/100  (documentation 100% · synonyms 100% · structure 100% · richness 100%)
```

| Component | Default weight | Measures |
|---|---|---|
| documentation | 0.35 | share of tables + columns + the view with a `COMMENT` |
| synonyms | 0.30 | share of tables + columns with `WITH SYNONYMS` |
| structure | 0.25 | PKs present, relationships resolvable, columns map to declared tables |
| richness | 0.10 | comments meeting min length and not echoing the column name |

## Configuration

`.semantic-view-guard.yml` at repo root:

```yaml
rules:
  SV001: error
  SV011: warn
  SV020: "off"        # quote off/on — YAML parses them as booleans
thresholds:
  min_score: 0
  synonym_coverage: 0.8
  min_comment_length: 15
weights:
  documentation: 0.35
  synonyms: 0.30
  structure: 0.25
  richness: 0.10
```

Inline suppression in a model body:

```sql
-- svguard:disable SV011, SV013
```

## CI

- **GitHub Actions:** see [`action.yml`](action.yml) — runs `svguard` and posts a sticky PR comment.
- **GitLab CI:** see [`examples/gitlab-ci.yml`](examples/gitlab-ci.yml) — posts a sticky MR note and emits a Code Quality artifact (inline MR annotations).

## Roadmap

- **v1.1** — diff mode: compare a PR against the base branch and classify semantic changes as breaking / risky / safe.
- **v2** — optional `--verify-deployed` (`DESCRIBE SEMANTIC VIEW` against a live account) and a thin dbt-package wrapper.

## License

Apache-2.0.
