Metadata-Version: 2.4
Name: turbine-data
Version: 0.7.11
Summary: A CLI tool for working with data products and contracts
Author-email: "Chibrani - Derks, Yassin" <yassin.chibrani-derks@enexis.nl>
Classifier: Development Status :: 4 - Beta
Requires-Python: <3.15,>=3.12
Requires-Dist: alembic>=1.18.4
Requires-Dist: croniter>=6.2.2
Requires-Dist: cyclopts>=4.8.0
Requires-Dist: fastapi-pagination>=0.15.12
Requires-Dist: fastapi>=0.135.1
Requires-Dist: httpx>=0.28.1
Requires-Dist: jinja2>=3.1.6
Requires-Dist: jsonschema-rs>=0.46.0
Requires-Dist: loguru>=0.7.3
Requires-Dist: lsprotocol>=2025.0.0
Requires-Dist: networkx>=3.6.1
Requires-Dist: numpy>=2.4.3
Requires-Dist: open-data-contract-standard>=3.1.2
Requires-Dist: packaging>=26.2
Requires-Dist: pathspec>=0.12
Requires-Dist: prompt-toolkit>=3.0.52
Requires-Dist: pydantic-settings>=2.13.1
Requires-Dist: pygls>=2.1.1
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: rich>=14.3.3
Requires-Dist: ruamel-yaml<0.18.0,>=0.17.0
Requires-Dist: scipy>=1.17.1
Requires-Dist: sqlakeyset>=2.0.1775222100
Requires-Dist: sqlalchemy>=2.0.48
Requires-Dist: sqlglot>=29.0.1
Requires-Dist: sqlmodel>=0.0.37
Requires-Dist: tomli-w>=1.2.0
Requires-Dist: tree-sitter-yaml>=0.7.2
Requires-Dist: tree-sitter>=0.25.2
Requires-Dist: uvicorn>=0.43.0
Provides-Extra: all
Requires-Dist: adbc-driver-manager>=1.0; extra == 'all'
Requires-Dist: adbc-driver-postgresql>=1.0; extra == 'all'
Requires-Dist: cryptography>=42; extra == 'all'
Requires-Dist: duckdb-engine>=0.13; extra == 'all'
Requires-Dist: duckdb>=0.9; extra == 'all'
Requires-Dist: fastapi-pagination>=0.15.12; extra == 'all'
Requires-Dist: fastapi>=0.135.1; extra == 'all'
Requires-Dist: httpx>=0.28.1; extra == 'all'
Requires-Dist: lsprotocol>=2025.0.0; extra == 'all'
Requires-Dist: openpyxl>=3.1.5; extra == 'all'
Requires-Dist: pandas>=2.0; extra == 'all'
Requires-Dist: plotly>=6.6.0; extra == 'all'
Requires-Dist: polars>=1.0; extra == 'all'
Requires-Dist: psycopg[binary]>=3.1; extra == 'all'
Requires-Dist: pyarrow>=15.0; extra == 'all'
Requires-Dist: pygls>=2.0.1; extra == 'all'
Requires-Dist: snowflake-connector-python>=3.0; extra == 'all'
Requires-Dist: snowflake-sqlalchemy>=1.7; extra == 'all'
Requires-Dist: sqlakeyset>=2.0.1775222100; extra == 'all'
Requires-Dist: sqlmodel>=0.0.37; extra == 'all'
Requires-Dist: streamlit-antd-components>=0.3.2; extra == 'all'
Requires-Dist: streamlit-echarts>=0.6.0; extra == 'all'
Requires-Dist: streamlit-extras>=1.3.0; extra == 'all'
Requires-Dist: streamlit>=1.56.0; extra == 'all'
Requires-Dist: uvicorn>=0.43.0; extra == 'all'
Provides-Extra: api
Requires-Dist: fastapi-pagination>=0.15.12; extra == 'api'
Requires-Dist: fastapi>=0.135.1; extra == 'api'
Requires-Dist: httpx>=0.28.1; extra == 'api'
Requires-Dist: sqlakeyset>=2.0.1775222100; extra == 'api'
Requires-Dist: sqlmodel>=0.0.37; extra == 'api'
Requires-Dist: uvicorn>=0.43.0; extra == 'api'
Provides-Extra: dashboard
Requires-Dist: httpx>=0.28.1; extra == 'dashboard'
Requires-Dist: openpyxl>=3.1.5; extra == 'dashboard'
Requires-Dist: plotly>=6.6.0; extra == 'dashboard'
Requires-Dist: streamlit-antd-components>=0.3.2; extra == 'dashboard'
Requires-Dist: streamlit-echarts>=0.6.0; extra == 'dashboard'
Requires-Dist: streamlit-extras>=1.3.0; extra == 'dashboard'
Requires-Dist: streamlit>=1.56.0; extra == 'dashboard'
Provides-Extra: duckdb
Requires-Dist: duckdb-engine>=0.13; extra == 'duckdb'
Requires-Dist: duckdb>=0.9; extra == 'duckdb'
Requires-Dist: fastapi-pagination>=0.15.12; extra == 'duckdb'
Requires-Dist: fastapi>=0.135.1; extra == 'duckdb'
Requires-Dist: httpx>=0.28.1; extra == 'duckdb'
Requires-Dist: lsprotocol>=2025.0.0; extra == 'duckdb'
Requires-Dist: pandas>=2.0; extra == 'duckdb'
Requires-Dist: polars>=1.0; extra == 'duckdb'
Requires-Dist: pyarrow>=15.0; extra == 'duckdb'
Requires-Dist: pygls>=2.0.1; extra == 'duckdb'
Requires-Dist: sqlakeyset>=2.0.1775222100; extra == 'duckdb'
Requires-Dist: sqlmodel>=0.0.37; extra == 'duckdb'
Requires-Dist: uvicorn>=0.43.0; extra == 'duckdb'
Provides-Extra: duckdb-minimal
Requires-Dist: duckdb-engine>=0.13; extra == 'duckdb-minimal'
Requires-Dist: duckdb>=0.9; extra == 'duckdb-minimal'
Provides-Extra: graph
Requires-Dist: networkx>=3.6.1; extra == 'graph'
Requires-Dist: scipy>=1.17.1; extra == 'graph'
Provides-Extra: lsp
Requires-Dist: lsprotocol>=2025.0.0; extra == 'lsp'
Requires-Dist: pygls>=2.0.1; extra == 'lsp'
Provides-Extra: postgres
Requires-Dist: adbc-driver-manager>=1.0; extra == 'postgres'
Requires-Dist: adbc-driver-postgresql>=1.0; extra == 'postgres'
Requires-Dist: fastapi-pagination>=0.15.12; extra == 'postgres'
Requires-Dist: fastapi>=0.135.1; extra == 'postgres'
Requires-Dist: httpx>=0.28.1; extra == 'postgres'
Requires-Dist: lsprotocol>=2025.0.0; extra == 'postgres'
Requires-Dist: pandas>=2.0; extra == 'postgres'
Requires-Dist: polars>=1.0; extra == 'postgres'
Requires-Dist: psycopg[binary]>=3.1; extra == 'postgres'
Requires-Dist: pyarrow>=15.0; extra == 'postgres'
Requires-Dist: pygls>=2.0.1; extra == 'postgres'
Requires-Dist: sqlakeyset>=2.0.1775222100; extra == 'postgres'
Requires-Dist: sqlmodel>=0.0.37; extra == 'postgres'
Requires-Dist: uvicorn>=0.43.0; extra == 'postgres'
Provides-Extra: postgres-minimal
Requires-Dist: psycopg[binary]>=3.1; extra == 'postgres-minimal'
Provides-Extra: python-checks
Requires-Dist: pandas>=2.0; extra == 'python-checks'
Requires-Dist: polars>=1.0; extra == 'python-checks'
Requires-Dist: pyarrow>=15.0; extra == 'python-checks'
Provides-Extra: snowflake
Requires-Dist: cryptography>=42; extra == 'snowflake'
Requires-Dist: fastapi-pagination>=0.15.12; extra == 'snowflake'
Requires-Dist: fastapi>=0.135.1; extra == 'snowflake'
Requires-Dist: httpx>=0.28.1; extra == 'snowflake'
Requires-Dist: lsprotocol>=2025.0.0; extra == 'snowflake'
Requires-Dist: pandas>=2.0; extra == 'snowflake'
Requires-Dist: polars>=1.0; extra == 'snowflake'
Requires-Dist: pyarrow>=15.0; extra == 'snowflake'
Requires-Dist: pygls>=2.0.1; extra == 'snowflake'
Requires-Dist: snowflake-connector-python>=3.0; extra == 'snowflake'
Requires-Dist: snowflake-sqlalchemy>=1.7; extra == 'snowflake'
Requires-Dist: sqlakeyset>=2.0.1775222100; extra == 'snowflake'
Requires-Dist: sqlmodel>=0.0.37; extra == 'snowflake'
Requires-Dist: uvicorn>=0.43.0; extra == 'snowflake'
Provides-Extra: snowflake-minimal
Requires-Dist: cryptography>=42; extra == 'snowflake-minimal'
Requires-Dist: snowflake-connector-python>=3.0; extra == 'snowflake-minimal'
Requires-Dist: snowflake-sqlalchemy>=1.7; extra == 'snowflake-minimal'
Description-Content-Type: text/markdown

# Turbine

**Contract-driven data quality for data products.**

[![PyPI](https://img.shields.io/pypi/v/turbine-data.svg)](https://pypi.org/project/turbine-data/)
[![Python](https://img.shields.io/pypi/pyversions/turbine-data.svg)](https://pypi.org/project/turbine-data/)
[![ODCS](https://img.shields.io/badge/ODCS-v3.1.0-green.svg)](https://github.com/bitol-io/open-data-contract-standard)

Turbine turns a YAML data contract into running quality checks. You declare a table's schema, ownership, freshness expectations, and validity rules in one file; Turbine validates the YAML offline, checks the live database matches it, runs every quality check against the data, scores the result, and exposes everything over a REST API and dashboard.

It uses [ODCS v3.1.0](https://github.com/bitol-io/open-data-contract-standard), so contracts round-trip with the rest of the data ecosystem.

## Example

A contract is a single YAML file:

```yaml
kind: DataContract
apiVersion: v3.1.0
id: orders
domain: sales
version: "1.0.0"
status: active

slaProperties:
  - property: latency
    element: public.orders.created_at
    value: 24
    unit: hour

schema:
  - name: public.orders
    properties:
      - name: order_id
        logicalType: integer
        required: true
        primaryKey: true
      - name: amount
        logicalType: number
        required: true
      - name: status
        logicalType: string
        required: true
      - name: created_at
        logicalType: timestamp
        required: true
```

Run every check on it:

```bash
turbine check --datasource default orders.yml
```

You get a per-check verdict, a quality score per dimension (completeness, accuracy, consistency, timeliness, validity), and the failing rows persisted for follow-up.

## Installation

```bash
uv add "turbine-data[duckdb]"        # local files, zero credentials
uv add "turbine-data[postgres]"      # PostgreSQL
uv add "turbine-data[snowflake]"     # Snowflake
uv add "turbine-data[all]"           # every driver + dashboard
```

Requires Python 3.12 or newer. The PyPI package is `turbine-data`; the CLI is `turbine`.

## Quick start

```bash
# 1. Scaffold a project
uv run turbine init --database duckdb

# 2. Configure credentials
cp .env.example .env

# 3. Validate and run
uv run turbine lint  src/<project>/contracts/example.yml
uv run turbine check src/<project>/contracts/example.yml --datasource default
```

`turbine init --demo` scaffolds a fully populated example project with real data and multiple contracts.

## Features

- **YAML contracts** in ODCS v3.1.0 — schema, ownership, SLAs, quality checks in one file
- **Quite a few check types** — missing, duplicate, invalid values, freshness, row count, custom SQL, Python, group, and window checks (z-score, spike, flatline)
- **Schema drift detection** — compare your contract to the live database before running a single check
- **Dimension-aware scoring** — every check is weighted by its quality dimension (completeness, accuracy, consistency, timeliness, validity)
- **Row-level flagging** — failing rows are persisted in a per-cell bitmap matrix; query which rows failed which checks across runs
- **Management API + dashboard** — `turbine serve` exposes runs, results, scores, and flagged rows over HTTP
- **Code generation** — scaffold SQLModel models and FastAPI routers from contracts
- **IDE support** — full LSP with VS Code and soon JetBrains extensions

## Management API

```bash
turbine serve --datasource default --port 8000
```

Endpoints under `/api/v1/manage/`: `/contracts`, `/checks/run`, `/runs/{id}`, `/runs/{id}/results`, `/flagged-rows/{table}`. Browse `/api/v1/manage/docs` for the interactive OpenAPI page.

```bash
turbine dashboard --port 5173
```

Renders the same data as charts, run history, and a flagged-rows explorer.

## CLI

```bash
turbine lint     <contract.yml>                     # validate the YAML offline
turbine validate <contract.yml> --datasource <name> # compare to live database schema
turbine check    <contract.yml> --datasource <name> # lint + validate + run every check
turbine status                                      # project health, flagged-row counts
turbine bump                                        # update contract versions
turbine new      contract|datasource|check          # scaffold a new resource
turbine generate                                    # SQLModel + FastAPI from contracts
```

## Ecosystem

**Orchestrators.** Run Turbine Check Runs as native steps in your workflow tool:

- [`dagster-turbine`](integrations/dagster-turbine/) — Check Definitions become Dagster `AssetCheckSpec`s. Partitioned assets scope each Check Run to their partition window.
- [`airflow-turbine`](integrations/airflow-turbine/) — `TurbineOperator` with a deferred trigger; tasks wait on a Check Run without blocking a worker.
- [`turbine-client`](integrations/turbine-client/) — sync + async Python client. Use directly when you need glue beyond the integrations above.

**Editors.**

- [VS Code extension](editors/vscode/) — diagnostics, autocomplete, quick fixes, run-from-editor. Install **Turbine** from the Marketplace.
- [JetBrains plugin](editors/jetbrains/) — same surface for IntelliJ, PyCharm, DataGrip.

## Documentation

Full documentation lives under [`docs/user/docs/`](docs/user/docs/):

- [Getting started](docs/user/docs/getting-started/index.md)
- [Concepts](docs/user/docs/concepts/index.md) — contracts, Check Definition, Check Run, Run Result, scoring
- [Guides](docs/user/docs/guides/index.md) — incremental mode, row flagging, CI workflows
- [CLI reference](docs/user/docs/reference/cli.md)

## Contributing

See [`docs/contributing/`](docs/contributing/) for dev setup, test layout, and code conventions.
