Metadata-Version: 2.4
Name: governor-audit
Version: 0.4.14
Summary: Read-only BigQuery cost-audit tool — single-user, gcloud ADC only, no GCS / no GitHub / no dbt installation.
Author-email: Simple Machines <hello@simplemachines.co.nz>
Maintainer-email: Simple Machines <hello@simplemachines.co.nz>
License-Expression: MIT
Project-URL: Homepage, https://github.com/simple-machines/governor
Project-URL: Repository, https://github.com/simple-machines/governor
Project-URL: Issues, https://github.com/simple-machines/governor/issues
Project-URL: Documentation, https://github.com/simple-machines/governor/tree/main/specs/141-production-audit
Project-URL: Changelog, https://github.com/simple-machines/governor/releases?q=audit-
Keywords: bigquery,cost-optimization,dbt,audit,data-engineering,warehouse
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Database
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.14
Description-Content-Type: text/markdown
Requires-Dist: governor-core==0.7.29
Requires-Dist: typer>=0.12.0
Requires-Dist: rich>=13.0.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: uvicorn>=0.30.6
Requires-Dist: pydantic>=2.0.0
Requires-Dist: google-cloud-bigquery>=3.20.0
Requires-Dist: google-auth>=2.30.0

# governor-audit

Read-only BigQuery cost-audit tool for single-user production audits. v0.4.14.

> **Posture**: Single-user. gcloud ADC only. No GCS, no GitHub, no service-account JSON, no dbt installation, no shadow validation. The only thing it talks to over the network is BigQuery — and only to query `INFORMATION_SCHEMA.JOBS_BY_PROJECT`, `INFORMATION_SCHEMA.COLUMNS`, `INFORMATION_SCHEMA.TABLE_STORAGE`, and `INFORMATION_SCHEMA.SCHEMATA_OPTIONS`.

## When to use this vs. the other governor packages

- **`governor-audit`** (this package): you have read access to a prod BigQuery project. You want a fast cost audit + detection findings without touching the dbt source code, running dbt, or setting up cloud infrastructure.
- **`governor-cli`**: you have the dbt project source on your machine and want to run dbt + propose fixes locally.
- **`governor-web`**: you operate the platform; you want shared infrastructure (GCS-backed manifests, GitHub PRs, scheduled syncs) for a team.

## What you get

- **Setup wizard** — first run walks you through ADC sign-in (`gcloud auth application-default login`) → pick BigQuery project + region + lookback (defaults to 24 hours) → first scan kicks off in the background. You land on the configurations page with a spinner card and an auto-refresh, then auto-redirect to the dashboard once the scan completes.
- **Dashboard** — Total / Build / Consumption / Flagged spend KPI cards, top-20 spenders bar chart (click a bar to open its top opportunity), paginated cost-drivers table with click-to-sort columns and a per-row issue count, and a **Storage Optimization** panel listing per-dataset physical-billing opportunities with copyable `ALTER SCHEMA` actions.
- **Detection engine** — every enabled rule from `governor_core.opportunities.rules` runs against each cached job. Each detection candidate persists as its own `Opportunity` row, deduped by `(rule_type, affected_table)`:
  - **Issues** (real cost / performance problems): `slot_contention`, `join_explosion`, `partition_pruning`, `shuffle_spill`, `storage_billing_optimization`. Also collect SQL-rewrite suggestions that fire on the same destination table and surface them inline as additional cards.
  - **Suggestions** (code-quality SQL rewrites): `dead_cte`, `dead_column`, `dead_window_expression`, `unused_aggregation_output`, `redundant_order_by`, `unused_join`, `select_star`, `cross_join_unaggregated`, `self_join_anti_pattern`. Suggestions also surface as standalone opportunities so models with no underlying issue still appear in `/opportunities`.
- **Opportunities workspace** (`/opportunities`) — listing with column-header sort and filters for **Dataset**, **Workload**, **Issue type**, and **Suggestions** (with / with-diff / none). Improvement-only rows show "—" in the Issue column with a tooltip; the Suggestions chip counts the row itself. Click any row to open the detail page.
- **Job history view** (`/jobs/{project.dataset.table}`) — every cached execution of a recurring job rolled up into one page. Cost-trend line chart (one point per UTC day with ≥ 1 execution; days without are skipped) with markers showing when the query body changed mid-window. *Query versions* section lists each distinct query body chronologically with a unified diff against the previous one. *Available improvements* surface only the LATEST version's suggestions; older versions get a `resolved` chip when the latest version no longer triggers a rule, and a `new in this version` chip when a later version regresses. Drill into a single execution via any chart point. See [spec 146](../../specs/146-audit-job-history-trend/).
- **Opportunity detail** — one consistent layout regardless of whether the row is an issue or a standalone suggestion: header pill, stat strip, *Why this fired* + *Recommended fix* (text), then *Available improvements* with each deterministic SQL diff in tabbed Diff / Original SQL cards (rendered from `governor_core.solutions.templates`). Plus *Evidence* and *BigQuery query* lookup panels.
- **Settings** — three rule pages plus account / appearance / LLM:
  - **Issues** (`/admin/settings/issues`) — toggle the five issue rules.
  - **Suggestions** (`/admin/settings/suggestions`) — toggle the suggestion rules.
  - **Storage Billing** (`/admin/settings/storage-billing`) — pricing parameters for the storage billing rule.
  - Plus **Account** (gcloud principal + ADC probe), **Appearance** (light / dark / system), **AI / LLM** (Gemini API key, optional — reviewer code lands in a future release).
  - Audit owns its own enable defaults: every rule is on unless you've explicitly toggled it off. The cloud catalog's per-rule defaults are intentionally ignored here.
- **Scan query preview** — the configurations page shows the exact `INFORMATION_SCHEMA` SQL the next scan will run, with resolved timestamps, on a Jobs / Columns / Storage tabbed panel and a copy button.
- **Scan history** — every scan run logged with timestamp, status (running / succeeded / failed), project, region, lookback, and job count. Failed scans carry a tooltip with the failure reason.
- **Background scans** — clicking *Run scan* on the configurations page (or submitting the setup wizard) kicks off the scan in a daemon thread and lands you back on the page immediately with a spinner card and a 5-second meta-refresh until completion.

## Quickstart

```sh
gcloud auth application-default login
uv tool install governor-audit
governor-audit start
# open http://localhost:8765 — first run goes through the setup wizard
```

The wizard saves config + auto-runs your first scan; you don't need to call `init` or `scan` manually unless you prefer the CLI.

CLI alternatives:

```sh
governor-audit init --project prod-warehouse-123 --region us
governor-audit scan                # uses config defaults (24-hour lookback)
governor-audit scan --days 7       # override the lookback for this run
governor-audit status              # connection / cache snapshot
governor-audit reset -y            # wipe cached scan data; keep config
governor-audit reset-config -y     # drop config.json so next start re-enters setup
governor-audit stop                # terminate the managed web server
```

See the spec quickstarts for the full first-audit walkthrough:

- [spec 141 quickstart](../../specs/141-production-audit/quickstart.md) — original audit MVP
- [spec 144](../../specs/144-audit-query-only-rules/) — query-only rule catalog and synthetic manifest
- [spec 145](../../specs/145-audit-storage-billing-data/) — TABLE_STORAGE ingestion + dashboard storage panel

## Architecture

- **Storage**: SQLite at `~/.governor-audit/state.db` via `governor_core.db.sqlite_compat`. Persisted shapes:
  - `BigQueryJob` — raw INFORMATION_SCHEMA rows.
  - `TableColumnMetadata` — column lists for `SELECT *` expansion.
  - `TableStorageMetric` — per-table byte counts + per-dataset billing model — feeds the storage-billing rule.
  - `Opportunity` — detection findings.
  - `ScanRun` (audit-only) — every scan attempt with status / timing / project / region / lookback. Drives the Scan history table and the auto-refresh state.
  - `AuditOpportunityMetadata` (audit-only) — dbt attribution + query hash sidecar tied to `Opportunity`.
- **Scan replaces previous data per (project, region)**: each new scan wipes the prior scan's `BigQueryJob` / `Opportunity` / `TableColumnMetadata` / `TableStorageMetric` rows for the same `config_id` before persisting fresh ones. `ScanRun` history is preserved (it's the audit log).
- **Project switching**: `(project, region)` hashes to a deterministic `config_id` (`governor_audit.scan.sentinels.config_id_for`). Every dashboard / opportunities / job-detail read is scoped to the active config_id, so switching projects in the configurations form shows only the new project's data.
- **Auth**: gcloud Application Default Credentials only — `google.auth.default()`. No service-account JSON. No browser OAuth.
- **Workload classification**: manifest-free heuristic — dbt-originated CTAS / MERGE / INSERT / UPDATE / DELETE → `build`; non-dbt SELECT → `consumption`; ambiguous → `other`. Driven by the `/* {"app": "dbt"` comment-prefix the dbt-bigquery adapter prepends.
- **Synthetic manifest** (spec 144): audit reuses every `governor_core` rule unmodified by building a synthetic dbt-shaped manifest from `BigQueryJob` rows. Each row becomes a model node keyed by destination table; CTAS / MERGE wrappers are stripped so manifest-driven analyzers see the inner `SELECT`. The same pattern feeds `governor_core.solutions.templates` for deterministic before/after SQL diffs.
- **Config migrations**: `~/.governor-audit/config.json` carries a `schema_version` and runs forward-only migrations on every load (currently v1 → v5). Each migration is a no-op when the payload is already at-or-above its target. The current migration set drops the deprecated v1 `manifest` field, resets stale per-rule overrides, and force-enables suggestion rules. `governor-audit reset-config` is the escape hatch when you want a totally fresh config.
- **Loopback only**: the FastAPI app rejects any request whose `Host:` header isn't a localhost variant. Not a public service.

## Versioning

`governor-audit` ships on its own version track, decoupled from the cloud bundle (`governor-core` / `governor-web` / `governor-cli` / `governor-bq`). Audit `v0.3.x` and cloud `v0.7.x` coexist. See [scripts/release-audit.sh](../../scripts/release-audit.sh) for the release flow.

## License

MIT.
