Metadata-Version: 2.4
Name: acdc_aws_etl_pipeline
Version: 0.2.5
Summary: Tools for ACDC ETL pipeline
Author: JoshuaHarris391
Author-email: harjo391@gmail.com
Requires-Python: >=3.9.5,<4.0.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: boto3
Requires-Dist: dbt-athena (==1.9.4)
Requires-Dist: dbt-core (==1.9.4)
Requires-Dist: gen3 (>=4.27.4,<5.0.0)
Requires-Dist: gen3_validator (>=1.1.1,<2.0.0)
Requires-Dist: pytest
Requires-Dist: python-dotenv
Requires-Dist: pytz (>=2025.2,<2026.0)
Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
Requires-Dist: tzlocal (>=5.3.1,<6.0.0)
Description-Content-Type: text/markdown

# acdc-aws-etl-pipeline
Infrastructure and code for the ACDC ETL pipeline and data operations in AWS

## Ingestion
- [ingestion](docs/ingestion.md)
- [upload_synthdata_s3](docs/upload_synthdata_s3.md)

## DBT



## Release Management
- [Writing DBT Releases](docs/write_dbt_release_info.md)


## Deploying the dictionary
e.g. to testing

```bash
# Example 
bash services/dictionary/pull_dict.sh <raw_dictionary_url>
bash services/dictionary/upload_dictionary.py <local_dictionary_path> <s3_target_uri>

# implementation
VERSION=v0.6.0
bash services/dictionary/pull_dict.sh "https://raw.githubusercontent.com/AustralianBioCommons/acdc-schema-json/refs/tags/${VERSION}/dictionary/prod_dict/acdc_schema.json"
python3 services/dictionary/upload_dictionary.py "services/dictionary/schemas/acdc_schema_${VERSION}.json" s3://gen3schema-cad-uat-biocommons.org.au/cad.json

```

## Generating synthetic metadata
- Run this script to generate synthetic metadata for the studies in the dictionary

```bash
bash services/synthetic_data/generate_synth_metadata.sh
```
