Metadata-Version: 2.4
Name: gojjam
Version: 0.3.0
Summary: A decoupled, SQL-first data engine for ingestion and transformation.
Author: Bisrat Awoke
Description-Content-Type: text/markdown
Requires-Dist: requests
Requires-Dist: duckdb
Requires-Dist: pyyaml
Requires-Dist: pydantic
Requires-Dist: python-dotenv
Requires-Dist: pyarrow
Requires-Dist: sqlglot
Requires-Dist: pandas
Requires-Dist: sqlalchemy
Requires-Dist: psycopg2
Requires-Dist: click
Provides-Extra: iceberg-warehouse-s3
Requires-Dist: pyiceberg[s3fs]; extra == "iceberg-warehouse-s3"
Provides-Extra: iceberg-catalog-sqlite
Requires-Dist: pyiceberg[sqlalchemy]; extra == "iceberg-catalog-sqlite"
Provides-Extra: iceberg-catalog-postgres
Requires-Dist: pyiceberg[sql-postgres]; extra == "iceberg-catalog-postgres"
Provides-Extra: iceberg
Requires-Dist: pyiceberg; extra == "iceberg"
Provides-Extra: aws
Requires-Dist: boto3; extra == "aws"
Provides-Extra: azure
Requires-Dist: azure-storage-blob; extra == "azure"
Provides-Extra: dev
Requires-Dist: build; extra == "dev"

# Gojjam

**The Lightweight, SQL-First Data Engine for Ingestion and Transformation.**

![Status](https://img.shields.io/badge/status-MVP%20%2F%20Alpha-orange)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

> **📖 Documentation:** Read the full setup guides and tutorials at [gojjam-docs](https://gojjam-docs.netlify.app/docs/tutorials/quickstart)

> **⚠️ Work in Progress:** Gojjam is currently in **MVP/Alpha**. The core engine is stable, but connector features are limited. Not yet recommended for production-critical workloads.

Gojjam is a data pipeline engine built for analytics engineers. It allows you to build end-to-end data pipelines—from extracting raw data to transforming it—using pure SQL and simple YAML configurations. No Python boilerplate required.

### 🚀 Why Gojjam?

1. **SQL-Only Extraction**
   Extract data from APIs, databases, or cloud storage (like AWS S3 and Azure Blob Storage) using only SQL. Gojjam treats external data sources like virtual SQL tables. Check out our [docs](https://gojjam-docs.netlify.app/docs/tutorials/quickstart) to get up and running.

2. **In-Flight Transformation**
   Flatten nested JSON payloads, rename columns, cast data types, and filter records _before_ the data hits your warehouse. Because this processing happens before loading data, it drastically reduces your downstream data warehouse compute costs.

3. **Complex Declarative Logic**
   Handle complex extraction requirements—like API pagination—entirely within SQL using [Calculated Models](https://gojjam-docs.netlify.app/docs/gojjam-ingest/calculated-models), eliminating the need for custom python loop scripts.

---

## 🛠️ Installation

```bash
pip install gojjam

```

---

## 📖 Quick Start

Gojjam includes a built-in scaffolding engine to get you running in seconds. For a detailed walk-through, see the [5-Minute Quickstart Guide](https://gojjam-docs.netlify.app/docs/tutorials/quickstart).

### 1. Initialize a Project

```bash
gojjam init

```

This creates a partitioned project structure:

- `gojjam_ingest_sources.yml`: Define your data sources.
- `gojjam_ingest_sinks.yml`: Define where your data lands.
- `ingest/`: Put your raw extraction SQL here.
- `transform/`: Put your warehouse transformation SQL here.

### 2. Run the Pipeline

```bash
# Run all ingest and transformation tasks
gojjam run --all

```

---

## 💡 The Gojjam Philosophy

Gojjam treats **APIs as virtual SQL tables**. Instead of writing complex Python scripts to handle pagination and nested JSON, you define the schema and let the engine handle the heavy lifting. This allows data engineers to focus on the **logic** of the data rather than the **plumbing** of the connection.

---

## 🔌 Supported Connectors

Gojjam is designed for extensibility. We currently support the following extraction and loading modules:

| Connector              | Type             | Status / Description                                                                                                                                              |
| ---------------------- | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **HTTP / REST**        | Extractor        | **Stable:** Supports JSON payloads via Basic Auth, JWT authentication, and SQL-based pagination.                                                                  |
| **PostgreSQL**         | Extractor/Loader | **Stable:** High-speed relational data movement via `psycopg2`.                                                                                                   |
| **CSV / Flat Files**   | Extractor        | **Stable:** Local-first data ingestion for quick analysis.                                                                                                        |
| **DuckDB**             | Extractor        | **Stable:** In-process analytical extraction for fast prototyping.                                                                                                |
| **AWS S3**             | Extractor        | **Stable:** Added support for extracting data from AWS S3 buckets.                                                                                                |
| **Azure Blob Storage** | Extractor        | **Stable:** Added support for extracting data from Azure Blob Storage containers.                                                                                 |
| **Apache Iceberg**     | Loader           | **Stable:** Fully supports **PostgreSQL catalogs** with storage layouts using **AWS S3** or **Local Filesystem** warehouses. Handles schema evolution seamlessly. |
| **Terminal**           | Loader           | **Stable:** Debug mode: stream results directly to your console.                                                                                                  |
