Metadata-Version: 2.4
Name: ca-biositing-pipeline
Version: 2026.4.6
Summary: ETL pipeline for CA Biositing project
Project-URL: Homepage, https://github.com/sustainability-software-lab/ca-biositing
Project-URL: Repository, https://github.com/sustainability-software-lab/ca-biositing.git
Project-URL: Issues, https://github.com/sustainability-software-lab/ca-biositing/issues
Author: SSEC Team
License: BSD 3-Clause License
        
        Copyright (c) 2025, University of Washington, eScience Institute, Scientific Software Engineering Center
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.12
Requires-Dist: asyncpg
Requires-Dist: ca-biositing-datamodels
Requires-Dist: geopandas
Requires-Dist: google-auth-oauthlib
Requires-Dist: gspread
Requires-Dist: gspread-dataframe
Requires-Dist: pandas<3,>=2.2.0
Requires-Dist: prefect<4,>=3
Requires-Dist: pyjanitor<0.32,>=0.31.0
Requires-Dist: pyogrio
Requires-Dist: python-dotenv<2,>=1.0.1
Description-Content-Type: text/markdown

# CA Biositing Pipeline

ETL pipeline for the
[CA Biositing](https://github.com/sustainability-software-lab/ca-biositing)
project — extracting biomass feedstock data from Google Sheets and external
sources, transforming it with pandas, and loading it into PostgreSQL.

Workflows are orchestrated with [Prefect](https://www.prefect.io/) and share
database models from the companion
[`ca-biositing-datamodels`](https://pypi.org/project/ca-biositing-datamodels/)
package.

## Installation

```bash
pip install ca-biositing-pipeline
```

## Quick Start

```python
from ca_biositing.pipeline.flows.primary_ag_product import primary_ag_product_flow

# Run the primary agricultural product ETL flow
primary_ag_product_flow()
```

## What's Included

- **Extract** — Pull data from Google Sheets, shapefiles, and public datasets
  (USDA Census/Survey, LandIQ, Billion Ton)
- **Transform** — Clean and reshape with pandas and pyjanitor
- **Load** — Upsert into PostgreSQL with foreign-key resolution
- **Flows** — Prefect flows combining extract/transform/load steps

## Key Dependencies

- [`ca-biositing-datamodels`](https://pypi.org/project/ca-biositing-datamodels/)
  — shared database models
- [Prefect](https://www.prefect.io/) — workflow orchestration
- [pandas](https://pandas.pydata.org/) — data manipulation
- [gspread](https://docs.gspread.org/) — Google Sheets integration
- [GeoPandas](https://geopandas.org/) — geospatial data handling

## Links

- [Repository](https://github.com/sustainability-software-lab/ca-biositing)
- [Issue Tracker](https://github.com/sustainability-software-lab/ca-biositing/issues)

## Contributors

[![Contributors](https://contrib.rocks/image?repo=sustainability-software-lab/ca-biositing)](https://github.com/sustainability-software-lab/ca-biositing/graphs/contributors)

## Acknowledgement

We acknowledge software engineering support from the University of Washington
[Scientific Software Engineering Center (SSEC)](https://escience.washington.edu/software-engineering/ssec/),
as part of the Schmidt Sciences
[Virtual Institute for Scientific Software (VISS)](https://www.schmidtsciences.org/).

## License

CA Biositing Pipeline is licensed under the open source
[BSD 3-Clause License](https://opensource.org/license/bsd-3-clause).
