Metadata-Version: 2.4
Name: dbt-gizmosql
Version: 1.11.16
Summary: The GizmoSQL adapter plugin for dbt
Author-email: Philip Moore <philip@gizmodata.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/gizmodata/dbt-gizmosql
Keywords: dbt,gizmosql,flightsql,duckdb,adbc,gizmodata
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dbt-core~=1.11.9
Requires-Dist: dbt-common~=1.38.0
Requires-Dist: dbt-adapters~=1.23.0
Requires-Dist: adbc-driver-gizmosql>=1.1.6
Requires-Dist: duckdb>=1.0.0
Requires-Dist: pandas>=2.0.0
Provides-Extra: dev
Requires-Dist: pip-tools; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: dbt-tests-adapter==1.19.*; extra == "dev"
Requires-Dist: black==24.2.0; extra == "dev"
Requires-Dist: bumpversion; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: flaky; extra == "dev"
Requires-Dist: freezegun==1.5.5; extra == "dev"
Requires-Dist: ipdb; extra == "dev"
Requires-Dist: mypy==1.8.0; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: pytest-csv; extra == "dev"
Requires-Dist: pytest-xdist; extra == "dev"
Requires-Dist: pytz; extra == "dev"
Requires-Dist: tox>=3.13; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: wheel; extra == "dev"
Requires-Dist: gizmosql; extra == "dev"
Requires-Dist: docker; extra == "dev"
Dynamic: license-file

# dbt-gizmosql
A [dbt](https://www.getdbt.com/product/what-is-dbt) adapter for [GizmoSQL](https://gizmodata.com/gizmosql)

[<img src="https://img.shields.io/badge/GitHub-gizmodata%2Fdbt--gizmosql-blue.svg?logo=Github">](https://github.com/gizmodata/dbt-gizmosql)
[<img src="https://img.shields.io/badge/GitHub-gizmodata%2Fgizmosql--public-blue.svg?logo=Github">](https://github.com/gizmodata/gizmosql-public)
[![dbt-gizmosql-ci](https://github.com/gizmodata/dbt-gizmosql/actions/workflows/ci.yml/badge.svg)](https://github.com/gizmodata/dbt-gizmosql/actions/workflows/ci.yml)
[![Supported Python Versions](https://img.shields.io/pypi/pyversions/dbt-gizmosql)](https://pypi.org/project/dbt-gizmosql/)
[![PyPI version](https://badge.fury.io/py/dbt-gizmosql.svg)](https://badge.fury.io/py/dbt-gizmosql)
[![PyPI Downloads](https://img.shields.io/pepy/dt/dbt-gizmosql.svg)](https://pypi.org/project/dbt-gizmosql/)

**[dbt](https://www.getdbt.com/)** enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.

## GizmoSQL
GizmoSQL is an Apache Arrow Flight-based SQL engine for data warehouses. It is designed to be fast, scalable, and easy to use.

It has DuckDB and SQLite back-ends. You can see more information about GizmoSQL [here](https://gizmodata.com/gizmosql).

## Features

This adapter provides feature parity with [dbt-duckdb](https://github.com/duckdb/dbt-duckdb) for all features applicable to a remote Flight SQL connection:

### Materializations
- **Table** and **View** (SQL and Python)
- **Incremental** with four strategies:
  - `append` -- simple record additions
  - `delete+insert` -- key-based upserts with DuckDB's `DELETE...USING` syntax
  - `merge` -- uses DuckDB's `MERGE` with `UPDATE BY NAME` / `INSERT BY NAME`
  - `microbatch` -- time-based batch processing via `event_time` windows
- **Snapshot** (check and timestamp modes) using UPDATE+INSERT pattern
- **Schema change handling**: `ignore`, `append_new_columns`, `sync_all_columns`, `fail`

### Python Models
Python models execute client-side using a local DuckDB instance for full API compatibility, then ship results to GizmoSQL via ADBC bulk ingest:

```python
def model(dbt, session):
    dbt.config(materialized="table")
    df = dbt.ref("upstream_model")
    df = df.filter(df.amount > 100)
    return df
```

- Supports DuckDB relations, pandas DataFrames, and PyArrow Tables as return types
- `dbt.ref()` and `dbt.source()` fetch data from GizmoSQL as Arrow and expose it as DuckDB relations
- Incremental Python models supported (with proper `dbt.is_incremental` handling)

#### Server-side pushdown with `session.remote_sql()`

Because Python models run client-side, `dbt.ref('big_table').filter(...)` pulls the **entire** upstream table over the network before filtering locally. When you only need a small slice of a large server-side table, use `session.remote_sql(query)` to push the query down to the GizmoSQL server — the filter/aggregation runs server-side and only the result crosses the wire:

```python
def model(dbt, session):
    dbt.config(materialized="table")
    schema = dbt.this.schema
    # Runs on the GizmoSQL server; only matching rows come back.
    return session.remote_sql(
        f"select * from {schema}.big_table where name = 'Joe'"
    )
```

`remote_sql()` returns a chainable local DuckDB relation, so you can combine it with the usual `.filter()`, `.project()`, `.df()`, pandas, etc. The rest of the `session` object behaves exactly like a local DuckDB connection (`session.sql(...)`, `session.register(...)`, …) — `remote_sql` is an additive escape hatch, not a replacement.

### Seed Loading
Seeds are loaded using DuckDB's CSV reader on the client side with ADBC bulk ingest to the server:

- Correct null handling (empty CSV fields become SQL `NULL`, not the string `'null'`)
- Proper type inference (dates detected as `DATE`, integers as `BIGINT`, etc.)
- Supports `column_types` overrides and custom delimiters
- Significantly faster than dbt's default batch `INSERT` path

### Constraints
All constraint types are enforced: `CHECK`, `NOT NULL`, `UNIQUE`, `PRIMARY KEY`, `FOREIGN KEY`.

### Documentation
- `persist_docs` support (`COMMENT ON` for relations and columns)
- Full catalog generation with `dbt docs generate`

### Utility Macros
DuckDB-compatible overrides for: `dateadd`, `last_day`, `listagg`, `split_part`.

### Writing to External Files (server-side)

Because GizmoSQL is essentially a remote, server-side DuckDB, dbt-gizmosql supports the `external` materialization — models that are backed by Parquet, CSV, or JSON files rather than database tables — with the same config surface as [dbt-duckdb](https://github.com/duckdb/dbt-duckdb?tab=readme-ov-file#writing-to-external-files).

The important distinction: **the `COPY` runs on the GizmoSQL server, not on your dbt client**. That's usually what you want. A production GizmoSQL deployment typically sits on a powerful cloud VM (lots of CPU, lots of RAM, fast local disks, a fat NIC, IAM role for blob storage). Pushing the write to that box is far faster than streaming a multi-GB result set back to the client just to write it out again — and the server's credentials/network topology are exactly what's needed to reach the destination.

```sql
{{ config(materialized='external', location='/data/warehouse/fact_orders.parquet') }}
select m.*, s.id is not null as has_source_id
from {{ ref('upstream_model') }} m
left join {{ source('upstream', 'source') }} s using (id)
```

#### Configuration

| Option | Default | Description |
| :---:  | :---:   | --- |
| `location` | [`external_location`](dbt/include/gizmosql/macros/utils/external_location.sql) macro | Server-side path (or S3/GCS/Azure URI) to write to. See below. |
| `format` | `parquet` | One of `parquet`, `csv`, `json`. Inferred from the `location` extension when omitted. |
| `delimiter` | `,` | For CSV, the field delimiter. |
| `options` | `{}` | Any other options for DuckDB's `COPY` statement — e.g. `compression`, `partition_by`, `codec`, `per_thread_output`. |
| `parquet_read_options` | `{'union_by_name': False}` | Options passed to `read_parquet()` when building the read-side view. |
| `csv_read_options` | `{'auto_detect': True}` | Options passed to `read_csv()`. |
| `json_read_options` | `{'auto_detect': True}` | Options passed to `read_json()`. |

If `location` is omitted, the file is written to `{external_root}/{model_name}.{format}`. Set `external_root` in your profile to control the default write location (local path or cloud URI):

```yaml
my-gizmosql-db:
  target: dev
  outputs:
    dev:
      type: gizmosql
      host: gizmosql.prod.example.com
      port: 31337
      auth_type: external
      use_encryption: True
      external_root: "s3://my-warehouse/dbt-output"
```

The `external_root` is resolved **on the GizmoSQL server**, so any path/URI the server's DuckDB backend can reach works — local filesystem paths, `s3://...`, `gs://...`, `azure://...`, etc. Any credentials needed to write there live on the server, not on your dbt client.

After the write, dbt-gizmosql creates a view over the file via `read_parquet` / `read_csv` / `read_json`, so downstream models can `ref()` the external model like any other relation.

#### Partitioning example

```sql
{{ config(
    materialized='external',
    format='parquet',
    options={'partition_by': 'year, month', 'compression': 'zstd'}
) }}
select * from {{ ref('fact_events') }}
```

#### Notes and limitations

- The directory referenced by `external_root` (or the parent directory of an explicit `location`) must already exist on the server — DuckDB's `COPY ... TO '<file>'` does not create parent directories for single-file writes. Cloud URIs like `s3://bucket/prefix` don't have this constraint.
- Incremental strategies are not supported on `external` models — each run fully replaces the file(s).
- dbt-duckdb's `plugin` / `glue_register` options are **not** supported: those are a client-side feature of dbt-duckdb with no analogue on the server. Setting either will produce a clear compile-time error.

## Installation

### Option 1 - from PyPi
```shell
# Create the virtual environment
python3 -m venv .venv

# Activate the virtual environment
. .venv/bin/activate

pip install --upgrade pip

python -m pip install dbt-core dbt-gizmosql
```

### Option 2 - from source (for development)
```shell
git clone https://github.com/gizmodata/dbt-gizmosql

cd dbt-gizmosql

# Create the virtual environment
python3 -m venv .venv

# Activate the virtual environment
. .venv/bin/activate

# Upgrade pip, setuptools, and wheel
pip install --upgrade pip setuptools wheel

# Install the dbt GizmoSQL adapter - in editable mode with dev dependencies
pip install --editable .[dev]
```

## Configuration

### Profile setup

Add the following to your `~/.dbt/profiles.yml` (change values to match your environment):

```yaml
my-gizmosql-db:
  target: dev
  outputs:
    dev:
      type: gizmosql
      host: localhost
      port: 31337
      database: dbt
      user: [username]
      password: [password]
      use_encryption: True
      tls_skip_verify: True
      threads: 2
```

### OAuth/SSO Authentication
For browser-based OAuth/SSO, use `auth_type: external` -- no username or password needed:
```yaml
my-gizmosql-db:
  target: dev
  outputs:
    dev:
      type: gizmosql
      host: gizmosql.example.com
      port: 31337
      auth_type: external
      use_encryption: True
      threads: 2
```

## Architecture

This adapter connects to GizmoSQL via Apache Arrow Flight SQL using the [ADBC](https://arrow.apache.org/adbc/) driver (`adbc-driver-gizmosql`). Key architectural decisions:

- **Autocommit mode**: Each statement auto-commits immediately. Flight SQL's `PREPARE` phase validates against committed catalog state, so explicit transactions would cause DDL from earlier statements to be invisible to later ones.
- **Client-side DuckDB**: Seeds and Python models use a local DuckDB instance for processing, with results shipped to the server via ADBC bulk ingest (Arrow columnar format over gRPC).
- **MERGE BY NAME**: Incremental merges use DuckDB's `UPDATE BY NAME` / `INSERT BY NAME` syntax, which is resilient to column ordering differences.

## Versioning

This adapter follows [semantic versioning](https://semver.org/). The major.minor version tracks dbt-core (e.g., dbt-core 1.11.x -> dbt-gizmosql 1.11.x).

## Reporting bugs and contributing code

- Want to report a bug or request a feature? Open [an issue](https://github.com/gizmodata/dbt-gizmosql/issues)
- Want to contribute? Pull requests are welcome

## Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the [dbt Code of Conduct](https://community.getdbt.com/code-of-conduct).
