Metadata-Version: 2.4
Name: acdc_aws_etl_pipeline
Version: 0.9.1
Summary: Tools for ACDC ETL pipeline
Author: JoshuaHarris391
Author-email: harjo391@gmail.com
Requires-Python: >=3.9.5,<4.0.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: awswrangler (>=3.14.0,<4.0.0)
Requires-Dist: boto3
Requires-Dist: dbt-athena (==1.9.4)
Requires-Dist: dbt-core (==1.9.4)
Requires-Dist: gen3 (>=4.27.4,<5.0.0)
Requires-Dist: gen3_validator (>=2.0.0,<3.0.0)
Requires-Dist: numpy (<2.0.0)
Requires-Dist: pyjwt (>=2.10.1,<3.0.0)
Requires-Dist: pytest
Requires-Dist: python-dotenv
Requires-Dist: pytz (>=2025.2,<2026.0)
Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
Requires-Dist: s3fs (==2025.10.0)
Requires-Dist: tenacity (>=8.2,<10.0)
Requires-Dist: tzlocal (>=5.3.1,<6.0.0)
Description-Content-Type: text/markdown

# acdc-aws-etl-pipeline
Infrastructure and code for the ACDC ETL pipeline and data operations in AWS

## Documentation

### Core Configuration & Management
- [Deployment Configuration Guide](docs/config.md)
- [Dictionary Deployment](docs/dictionary_deployment.md)
- [Service Management](docs/service_management.md)
- [Kubernetes Utilities](docs/k8s_utilities.md)
- [Troubleshooting](docs/troubleshooting.md)

### Data Lifecycle & ETL
- [Data Ingestion](docs/data_ingestion.md)
- [Data Validation](docs/data_validation.md)
- [Data Transformation (dbt)](docs/data_transformation_dbt.md)
- [Data Releases](docs/write_data_release.md)
- [Data Deletion](docs/data_deletion.md)
- [Metadata Deletion by GUID](docs/metadata_deletion_by_guid.md)

### Metadata & Registry Operations
- [REST API Upload to Sheepdog](docs/rest_api_sheepdog_upload.md)
- [IndexD File Registration](docs/indexd_registration.md)
- [Synthetic Data Generation](docs/synthetic_data_generation.md)

### Analysis & Querying
- [Querying Athena](docs/querying_athena.md)
- [Writing Athena Queries to JSON](docs/write_athena_queries_to_json.md)

## Library and source code (`src/acdc_aws_etl_pipeline`)

The Python package in [`src/acdc_aws_etl_pipeline`](src/acdc_aws_etl_pipeline) provides reusable utilities for ingestion, validation, uploads, and Athena/Glue operations used across the pipeline and services.

### Modules

- **`ingest/`**: ingestion helpers for loading source datasets into S3/Glue (see [`ingest/ingest.py`](src/acdc_aws_etl_pipeline/ingest/ingest.py)).
- **`upload/`**: Gen3/Sheepdog metadata submission and deletion utilities (e.g. [`upload/metadata_submitter.py`](src/acdc_aws_etl_pipeline/upload/metadata_submitter.py)).
- **`validate/`**: schema validation utilities and helpers for validation workflows (see [`validate/validate.py`](src/acdc_aws_etl_pipeline/validate/validate.py)).
- **`utils/`**: shared Athena/Glue/dbt/release helpers (e.g. [`utils/athena_utils.py`](src/acdc_aws_etl_pipeline/utils/athena_utils.py), [`utils/release_writer.py`](src/acdc_aws_etl_pipeline/utils/release_writer.py)).

### Local development

To install dependencies and run tests:

```bash
pip install poetry
poetry install
source $(poetry env info --path)/bin/activate
poetry run pytest
```

### Install from PyPI

Releases are published automatically, so you can also install the package directly:

```bash
pip install acdc_aws_etl_pipeline
