Metadata-Version: 2.3
Name: bioforklift
Version: 0.1.0
Summary: Package for handling bioinformatics data automations between data sources
License: GPL-3.0
Keywords: terra,bigquery,bioinformatics,data-integration
Author: Michal-Babins
Author-email: michal.babinski@theiagen.com
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: google (>=3.0.0,<4.0.0)
Requires-Dist: google-api-core (>=2.24.1,<3.0.0)
Requires-Dist: google-api-python-client (>=2.160.0,<3.0.0)
Requires-Dist: google-cloud-bigquery (>=3.29.0,<4.0.0)
Requires-Dist: google-cloud-storage (>=3.1.0,<4.0.0)
Requires-Dist: mkdocs (>=1.6.1,<2.0.0)
Requires-Dist: mkdocs-material (>=9.6.9,<10.0.0)
Requires-Dist: pandas (>=2.2.3,<3.0.0)
Requires-Dist: pyarrow (>=19.0.1,<20.0.0)
Requires-Dist: pydantic (>=2.10.6,<3.0.0)
Requires-Dist: pytest-mock (>=3.14.0,<4.0.0)
Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Description-Content-Type: text/markdown

# bioforklift

[![Tests](https://github.com/theiagen/bioforklift/actions/workflows/pytests.yml/badge.svg)](https://github.com/theiagen/bioforklift/actions/actions/workflows/pytests.yml)

Automation Data Movement and Integration Library for Sample Datastores

🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️🏗️

🚧 Under Construction 🚧

### Getting Setup

This project uses `poetry` for project management 

If you don't have poetry present, please install it with:
`pip install poetry`

Then run poetry env activate which will create your environment:
`poetry env activate`

Next, install the dependencies listed in `poetry.lock` utilizing:
`poetry install`

The dependencies will be installed based on the locked versions in the `poetry.lock` file, since I already ran `poetry install` and pushed the lock file. For more information on poetry, read here: https://python-poetry.org/docs/basic-usage/

Finally, re authorize your gcloud authentication. This obtains your credentials via a web flow and stores them in 'the well-known location for Application Default Credentials'. Now any code/SDK you run will be able to find the credentials automatically. This is a good stand-in when you want to locally test code which would normally run on a server and use a server-side credentials file. `gcloud auth application-default login`

### Note
This is a first time dump of everything I've been putting together for an automation library for our data movement needs

### Overview
<img src="assets/diagrams/Forklift_Base_Architecture.png" alt="bioforklift Base Architecture" width="800" style="max-width: 100%;" />

# TODO:
- Add target workspace entry for Terra class
- Add test suite for bigquery layer
- Add Terra2Bq integration layer
- Add module level logging and better error handling
- Define key yaml tags with team
- Test scope of bigquery range
- Test, Test, Test

Biggest lift to do is scope out what we actually want to include for the bigquery samples class and how we want to name key identifiers in the yamls, develops some internal schema for that, and then after that we should be flying. 

🥶
