Metadata-Version: 2.4
Name: dlt-firebolt
Version: 0.1.0
Summary: dlt destination for Firebolt (staged Parquet + COPY INTO)
Project-URL: Homepage, https://github.com/firebolt-analytics/dlt-firebolt
Project-URL: Repository, https://github.com/firebolt-analytics/dlt-firebolt
Author: Firebolt Analytics
License: Apache-2.0
License-File: LICENSE
Keywords: data-pipeline,dlt,etl,firebolt
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Database
Requires-Python: >=3.10
Requires-Dist: dlt[parquet,s3,sqlalchemy]>=1.0.0
Requires-Dist: firebolt-sqlalchemy
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: python-dotenv; extra == 'dev'
Requires-Dist: requests; extra == 'dev'
Description-Content-Type: text/markdown

# dlt-firebolt

Prototype [dlt](https://dlthub.com/) destination for [Firebolt](https://www.firebolt.io/).

Loads dlt pipelines into Firebolt using **filesystem staging (Parquet on S3) + `COPY INTO`**, the same pattern as dlt's Snowflake and Redshift destinations.

## Status

Spike complete. Hardening done; packaging and upstream prep in progress.

| Phase | What it proved |
|-------|----------------|
| 1 | dlt → S3 Parquet → manual COPY INTO |
| 2 | Generic `sqlalchemy` destination is not viable on Firebolt |
| 3 | Native `destination="firebolt"` end-to-end |
| 4 | Append / merge / replace disposition scripts |

See [SPIKE.md](SPIKE.md) for spike notes.

## License

Apache License 2.0 — see [LICENSE](LICENSE).

## Install

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env   # fill in Firebolt + S3 creds
```

Or install dependencies only (no editable package):

```bash
pip install -r requirements.txt
pip install -r requirements-dev.txt
```

## Quick start (Phase 3 demo)

Requires:

- Firebolt `CREATE LOCATION` for your S3 bucket — set `FIREBOLT_S3_LOCATION_NAME` to the location name (e.g. `sprinto_s3`)
- HubSpot private app token in `.env` (demo only)
- AWS credentials for S3 staging

```bash
export AWS_PROFILE=your-profile
python phase3_hubspot_to_firebolt.py
```

Optional: copy `.dlt/secrets.toml.example` to `.dlt/secrets.toml` and run with `DLT_USE_SECRETS=1`.

Before running demos, validate credentials:

```bash
python check_firebolt_env.py
```

### Disposition checks (Phase 4)

Run each command separately (do not paste inline comments):

```bash
python phase4_dispositions.py --mode merge
python phase4_dispositions.py --mode append
python phase4_dispositions.py --mode append
python phase4_dispositions.py --mode replace
```

For append, run the command twice and confirm the row count grows.

Verify in Firebolt (default dataset `demo`):

```sql
SELECT COUNT(*) FROM demo_hubspot_contacts;
```

## Usage in a dlt pipeline

```python
import sys
from pathlib import Path

sys.path.insert(0, str(Path(__file__).resolve().parent))

import dlt
from firebolt_dest.configuration import make_firebolt_pipeline

pipeline = make_firebolt_pipeline(
    pipeline_name="my_pipeline",
    dataset_name="my_dataset",
)

pipeline.run(my_resource(), loader_file_format="parquet")
```

Or with `.dlt/secrets.toml`:

```python
pipeline = make_firebolt_pipeline(
    pipeline_name="my_pipeline",
    dataset_name="my_dataset",
    from_secrets=True,
)
```

Tables land as `{dataset}_{table}` (e.g. `my_dataset_orders`).

Connection details from environment variables — see [.env.example](.env.example) — or from `.dlt/secrets.toml` — see [.dlt/secrets.toml.example](.dlt/secrets.toml.example).

## Layout

```
firebolt_dest/          # destination implementation (fork Redshift COPY pattern)
  factory.py            # registers destination="firebolt"
  client.py             # COPY load jobs
  sql_client.py         # Firebolt SQLAlchemy client
  copy_sql.py           # COPY INTO SQL generation
  configuration.py      # credentials + S3 location config
phase1_*.py             # spike: dlt → S3 only
phase2_*.py             # spike: dialect smoke test
phase3_*.py             # spike: full native destination demo
phase4_*.py             # append / merge / replace disposition checks
.dlt/config.toml        # non-sensitive dlt defaults (parquet loader)
.dlt/secrets.toml.example
tests/                  # unit tests (no Firebolt connection)
```

## Configuration

| Variable | Required | Description |
|----------|----------|-------------|
| `FIREBOLT_CLIENT_ID` | yes | Service account client ID |
| `FIREBOLT_CLIENT_SECRET` | yes | Service account secret |
| `FIREBOLT_ACCOUNT_NAME` | yes | Firebolt account name |
| `FIREBOLT_DATABASE` | yes | Target database |
| `FIREBOLT_ENGINE` | yes | Engine name |
| `FIREBOLT_S3_LOCATION_NAME` | yes* | Firebolt external location name (must match `CREATE LOCATION`; e.g. `sprinto_s3`) |
| `S3_BUCKET` | yes | Staging bucket |
| `S3_PREFIX` | no | Key prefix (default: `dlt-landing`) |
| `DLT_DATASET_NAME` | no | Demo dataset (default: `demo`) |

Credentials belong in `.env` (gitignored) or `.dlt/secrets.toml` (gitignored). See `.dlt/secrets.toml.example`.

## Tests

```bash
pip install -r requirements-dev.txt
pytest

# Optional: live Firebolt + S3 (requires .env and AWS creds)
FIREBOLT_RUN_INTEGRATION=1 pytest -m integration -v
```

## Roadmap

- [x] Package as installable module (`pip install -e .` / `dlt-firebolt`)
- [x] Config via env vars and `.dlt/secrets.toml` (both live-tested)
- [x] Merge/append/replace dispositions (merge via delete-insert; replace via truncate-and-insert or insert-from-staging)
- [x] Unit tests for COPY and merge SQL generation
- [x] Integration test harness (env-gated)
- [x] Destination README (dlt-style setup doc)
- [ ] PyPI publish (`pip install dlt-firebolt` from PyPI)
- [ ] Upstream PR to [dlt](https://github.com/dlt-hub/dlt) or community listing

## Related

Customer connector demos that consume this pattern live separately in [sprinto-connectors](https://github.com/firebolt-analytics/sprinto-connectors) (private).
