Metadata-Version: 2.1
Name: dagster-odp
Version: 0.1.4
Summary: A configuration-driven framework for building Dagster pipelines
Author-email: Jonathan Bhaskar <hello@jonathanb.me>
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Requires-Python: <3.13,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dagster<1.10.0,>=1.9.1
Requires-Dist: dagster-webserver<1.10.0,>=1.9.1
Requires-Dist: dagster-gcp<0.26.0,>=0.25.1
Requires-Dist: dagster-dbt<0.26.0,>=0.25.1
Requires-Dist: dbt-core<1.9.0,>=1.8.8
Requires-Dist: dbt-bigquery<1.9.0,>=1.8.3
Requires-Dist: dbt-duckdb<1.9.0,>=1.8.4
Requires-Dist: dlt[bigquery]<0.6.0,>=0.5.4
Requires-Dist: duckdb<2.0.0,>=1.0.0
Requires-Dist: google-cloud-storage<3.0.0,>=2.18.2
Requires-Dist: fsspec>=2024.10.0
Requires-Dist: gcsfs>=2024.10.0
Requires-Dist: soda-core<3.4.0,>=3.3.22
Requires-Dist: soda-core-bigquery<3.4.0,>=3.3.22
Requires-Dist: soda-core-duckdb<3.4.0,>=3.3.22
Requires-Dist: chevron<0.15.0,>=0.14.0
Requires-Dist: requests<3.0.0,>=2.32.3
Provides-Extra: dev
Requires-Dist: pytest<9.0.0,>=8.3.3; extra == "dev"
Requires-Dist: freezegun<2.0.0,>=1.5.1; extra == "dev"
Requires-Dist: pytest-cov>=6.0.0; extra == "dev"
Requires-Dist: mkdocs-material<10.0.0,>=9.5.44; extra == "dev"
Requires-Dist: mkdocs-glightbox<0.5.0,>=0.4.0; extra == "dev"
Requires-Dist: black>=24.10.0; extra == "dev"
Requires-Dist: flake8>=7.1.1; extra == "dev"
Requires-Dist: flake8-bugbear>=24.10.31; extra == "dev"
Requires-Dist: isort>=5.13.2; extra == "dev"
Requires-Dist: mypy>=1.13.0; extra == "dev"
Requires-Dist: pylint>=3.3.1; extra == "dev"

# dagster-odp (open data platform)

[![PyPI version](https://badge.fury.io/py/dagster-odp.svg)](https://badge.fury.io/py/dagster-odp)
[![Python Versions](https://img.shields.io/pypi/pyversions/dagster-odp.svg)](https://pypi.org/project/dagster-odp/)
[![License](https://img.shields.io/pypi/l/dagster-odp.svg)](https://github.com/runodp/dagster-odp/blob/main/LICENSE)
[![Documentation Status](https://github.com/runodp/dagster-odp/actions/workflows/gh_pages.yml/badge.svg)](https://runodp.github.io/dagster-odp/)
[![CI Status](https://github.com/runodp/dagster-odp/actions/workflows/ci.yml/badge.svg)](https://github.com/runodp/dagster-odp/actions/workflows/ci.yml)
[![Coverage](https://img.shields.io/codecov/c/github/runodp/dagster-odp)](https://codecov.io/gh/runodp/dagster-odp)
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

dagster-odp simplifies data pipeline development by enabling teams to build Dagster pipelines through configuration rather than code. It reduces the learning curve for Dagster while promoting standardization and faster development of data workflows.

## Key Features

- **Configuration-Driven Development**: Build data pipelines using YAML/JSON instead of Python code

- **Pre-built Tasks**:
  - **Google Cloud Operations**: Transfer and export data between GCS and BigQuery, with support for GCS file downloads.
  - **DuckDB Operations**: Load files into DuckDB, execute SQL queries, and export table contents to files.
  - **Utility Operations**: Execute shell commands with configurable environments and working directories.

- **Extensible Framework**: Create custom tasks, sensors, and resources that can be used directly in configuration files

- **Enhanced Modern Data Stack Integration**:
  - **DLT+**: Extended integration with automatic asset creation and granular object handling
  - **DBT+**: Simplified variable management and external source configuration
  - **Soda**: Configuration-driven data quality checks

- **Enhanced Asset Management**:
  - Standardized materialization metadata
  - Simplified dependency management
  - External source handling

- **Flexible Automation**: Configuration-based jobs, schedules, sensors, and partitioning


## Quick Example

Here's a simple pipeline that downloads data and loads it into DuckDB:

```yaml
# odp_config/workflows/pipeline.yaml
assets:
  - asset_key: raw_data
    task_type: url_file_download
    params:
      source_url: https://example.com/data.parquet
      destination_file_path: ./data/raw.parquet

  - asset_key: analyzed_data
    task_type: file_to_duckdb
    depends_on: [raw_data]
    params:
      source_file_uri: "{{raw_data.destination_file_path}}"
      destination_table_id: analyzed_table
```

## Installation

```bash
pip install dagster-odp
```

## Getting Started

1. Create a new project using the Dagster CLI:
```bash
dagster project scaffold --name my-odp-project
cd my-odp-project
```

2. Create the ODP configuration directories:
```bash
mkdir -p odp_config/workflows
```

3. Update your definitions.py:
```python
from dagster_odp import build_definitions
defs = build_definitions("odp_config")
```

4. Start building pipelines in your workflows directory using YAML/JSON configuration.

Check out our [Quickstart Guide](https://runodp.github.io/dagster-odp/getting-started/quickstart/) for a complete walkthrough.

## Who Should Use dagster-odp?

- **Data Teams** seeking to standardize pipeline creation
- **Data Analysts/Scientists** who want to create pipelines without extensive coding
- **Data Engineers** looking to reduce boilerplate code and maintenance overhead
- **Organizations** adopting Dagster who want to accelerate development

## Documentation

[Comprehensive documentation](https://runodp.github.io/dagster-odp/) is available, including:

- [Tutorials](https://runodp.github.io/dagster-odp/tutorials/tutorials/)
- [Concepts Guide](https://runodp.github.io/dagster-odp/concepts/concepts/)
- [Integration Guides](https://runodp.github.io/dagster-odp/integrations/integrations/)
- [Reference Documentation](https://runodp.github.io/dagster-odp/reference/reference/)

## Contributing

Contributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details on how to submit pull requests, report issues, and contribute to the project.

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
