Metadata-Version: 2.4
Name: pygeofetch
Version: 0.1.6
Summary: Universal satellite data download pipeline with unified access to 20+ repositories
Author: pygeofetch Contributors
License: MIT License
        
        Copyright (c) 2024 pygeofetch Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/pygeofetch/pygeofetch
Project-URL: Documentation, https://pygeofetch.readthedocs.io
Project-URL: Repository, https://github.com/pygeofetch/pygeofetch
Project-URL: Bug Tracker, https://github.com/pygeofetch/pygeofetch/issues
Keywords: satellite,remote-sensing,GIS,earth-observation,geospatial
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: GIS
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.27
Requires-Dist: httpx[http2]>=0.27
Requires-Dist: click>=8.1
Requires-Dist: pydantic>=2.5
Requires-Dist: pydantic-settings>=2.1
Requires-Dist: rich>=13.7
Requires-Dist: shapely>=2.0
Requires-Dist: pyproj>=3.6
Requires-Dist: pystac>=1.9
Requires-Dist: cryptography>=42.0
Requires-Dist: keyring>=24.3
Requires-Dist: aiofiles>=23.2
Requires-Dist: tenacity>=8.2
Requires-Dist: PyYAML>=6.0
Requires-Dist: python-dateutil>=2.9
Requires-Dist: tqdm>=4.66
Requires-Dist: anyio>=4.2
Requires-Dist: boto3>=1.34
Requires-Dist: requests>=2.31
Requires-Dist: click-completion>=0.5
Provides-Extra: geo
Requires-Dist: rasterio>=1.3; extra == "geo"
Requires-Dist: geopandas>=0.14; extra == "geo"
Requires-Dist: pyarrow>=15.0; extra == "geo"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: pytest-mock>=3.12; extra == "dev"
Requires-Dist: vcrpy>=6.0; extra == "dev"
Requires-Dist: httpx[mock]>=0.27; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: ruff>=0.3; extra == "dev"
Requires-Dist: mypy>=1.8; extra == "dev"
Requires-Dist: bump2version>=1.0; extra == "dev"
Provides-Extra: all
Requires-Dist: pygeofetch[dev,geo]; extra == "all"
Dynamic: license-file

<div align="center">

<img src="https://appiahkubis14.github.io/portfolio/logo/samuel_logo_dark.svg" alt="PyGeoFetch Logo" width="200"/>

# PyGeoFetch

[![PyPI version](https://badge.fury.io/py/pygeofetch.svg)](https://pypi.org/project/pygeofetch/)
[![Python Versions](https://img.shields.io/pypi/pyversions/pygeofetch.svg)](https://pypi.org/project/pygeofetch/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/appiahkubis14/pygeofetch/actions/workflows/tests.yml/badge.svg)](https://github.com/appiahkubis14/pygeofetch/actions)
[![Coverage](https://codecov.io/gh/appiahkubis14/pygeofetch/branch/main/graph/badge.svg)](https://codecov.io/gh/appiahkubis14/pygeofetch)

**Universal satellite data pipeline — unified access to 22+ satellite repositories with one CLI or Python API.**

</div>

---

## 📖 Introduction

PyGeoFetch is a **production-ready satellite data acquisition framework** that provides unified, authenticated access to 22+ Earth observation repositories — including Sentinel, Landsat, Planet, Maxar, Airbus, Copernicus, USGS, NASA, JAXA, and more — through a single consistent CLI and Python API.

The package abstracts away the authentication complexity, API fragmentation, and format inconsistencies of individual satellite providers, giving researchers and engineers a single command or function call to search, filter, download, and post-process satellite imagery at scale.

PyGeoFetch provides five core capabilities:

1. Authenticated access to 22+ providers, with secure credential storage via system keyring (macOS Keychain, Windows Credential Manager, Linux Secret Service).
2. Unified search across all providers, returning standardized GeoJSON, STAC, GeoParquet, or CSV results sortable by cloud cover, date, score, or satellite.
3. Resilient parallel downloads with checksum verification, resume support, exponential backoff, and atomic writes.
4. A chainable post-processing system for reprojection, compression, NDVI/NDWI computation, resampling, and Cloud Optimized GeoTIFF conversion.
5. YAML pipeline orchestration with cron scheduling, webhook notifications, and full pipeline history — enabling repeatable, automated geospatial data workflows.

---

## 📝 Statement of Need

Accessing satellite data at scale is surprisingly fragmented. Each provider — USGS, Copernicus, Planet, Maxar, NASA — exposes a different authentication scheme, a different query API, a different download protocol, and a different file format. Researchers and engineers working across multiple providers must maintain a patchwork of custom scripts, scattered credentials, and ad hoc download logic, making workflows difficult to reproduce and brittle to maintain.

Existing tools address parts of this problem: EODAG supports several providers but lacks pipeline orchestration and commercial coverage; `pystac-client` handles STAC-compliant endpoints only; `sentinelsat` is Sentinel-specific. No single tool covers the full breadth of providers needed for operational geospatial workflows.

PyGeoFetch addresses this gap by providing:

- A **single CLI** that works identically across all 22 providers.
- A **unified Python API** with standardized result and download models, removing provider-specific boilerplate.
- A **pipeline layer** for scheduling recurring data acquisitions without external orchestration tools.
- **Production-grade resilience** — circuit breakers, retries, resume, checksum verification, and atomic writes — that make satellite downloads reliable enough for automated pipelines.

---

## 🚀 Key Features

### 🛰️ 22+ Satellite Providers
- Open-access providers with no login required: Microsoft Planetary Computer, AWS Earth, Element 84, NOAA Big Data, ESA SciHub mirrors, JAXA, ISRO Bhuvan, INPE CBERS, DigitalGlobe Open Data
- Credentialled providers: USGS, Copernicus CDSE, NASA Earthdata, Planet Labs, Sentinel Hub, Maxar GBDX, Airbus OneAtlas, Alaska Satellite Facility, Google Earth Engine, OpenTopography, TerraBotics
- Supports SAR, sub-metre, STAC, optical, DEM, and LiDAR collections across providers

### 🔍 Unified Search
- Single query across multiple providers simultaneously with merged, deduplicated results
- Filter by bounding box, geometry file, date range, cloud cover, resolution, processing level, and CQL2 expressions
- Output to table, JSON, GeoJSON, STAC ItemCollection, GeoParquet, CSV, or scene IDs

### 📥 Resilient Downloads
- Adaptive parallel downloads with configurable concurrency
- SHA256 checksum verification, resume support, and exponential-backoff retries
- Atomic writes — no partial files ever written to disk
- Bandwidth throttling and webhook notifications on completion or failure

### ⚙️ Post-Processing Chains
Chain operations applied immediately after download:

`unzip` → `reproject:EPSG:4326` → `compress:lzw` → `ndvi` → `resample:10` → `cog`

Full list: unzip, reproject, compress, NDVI, NDWI, composite, atmospheric correction, clip, resample, Cloud Optimized GeoTIFF, merge, pan-sharpen.

### 📋 YAML Pipeline Orchestration
- Define search → filter → download → export workflows in YAML
- Schedule with cron expressions, list history, retry failed runs
- Docker-ready for containerized pipeline deployments

### 🔐 Security by Default
- Credentials stored in system keyring — never logged or written to disk in plaintext
- TLS 1.2+ enforced, SSL verification always on, no telemetry, no analytics

---

## 📦 Installation

```bash
# Core
pip install pygeofetch

# + Raster/vector processing (rasterio, geopandas, rioxarray)
pip install "pygeofetch[geo]"

# + Everything
pip install "pygeofetch[all]"
```

**Requirements:** Python 3.9+

---

## ⚡ Quick Start

### CLI

```bash
# Add credentials
pygeofetch auth add usgs --username USER --password PASS
pygeofetch auth add planet --api-key YOUR_KEY

# Search
pygeofetch search run \
    --bbox "-74.1,40.6,-73.7,40.9" \
    --start-date 2024-01-01 \
    --cloud-cover 0-15 \
    --providers planetary_computer \
    --output results.geojson

# Download with post-processing
pygeofetch download run \
    --from-search results.geojson \
    --output ./data/ \
    --parallel 4 \
    --verify-checksum \
    --post-process "unzip,reproject:EPSG:4326,compress:lzw,cog"
```

### Python API

```python
from pathlib import Path
from pygeofetch import pygeofetch
from pygeofetch.models import SearchQuery, DownloadOptions

sb = pygeofetch()
sb.add_credentials("usgs", username="user", password="pass")
sb.add_credentials("planet", api_key="PL_KEY")

results = sb.search(
    SearchQuery(
        bbox=(-74.1, 40.6, -73.7, 40.9),
        start_date="2024-01-01",
        end_date="2024-06-01",
        cloud_cover_max=20,
    ),
    providers=["usgs", "copernicus", "planetary_computer", "aws_earth"],
)

downloads = sb.download(
    results[:5],
    destination=Path("./data/"),
    options=DownloadOptions(parallel=4, verify_checksum=True, resume=True),
)

for dr in downloads:
    print(f"✓ {dr.data_id} ({dr.bytes_downloaded // 1024 // 1024:.1f} MB)" if dr.success else f"✗ {dr.data_id}: {dr.error}")
```

### YAML Pipeline

```yaml
name: weekly-sentinel2-ndvi
schedule: "0 6 * * 1"
steps:
  - search:
      providers: [copernicus, aws_earth, planetary_computer]
      date_range: last_7_days
      cloud_cover: 0-10
      bbox: "-74.1,40.6,-73.7,40.9"
  - filter:
      expression: "data.cloud_cover < 5"
  - download:
      parallel: 4
      output: ./raw/
      verify_checksum: true
      post_process: "unzip,reproject:EPSG:4326,cog"
  - export:
      format: cloud_optimized_geotiff
      destination: s3://my-bucket/ndvi/
```

```bash
pygeofetch pipeline run weekly-sentinel2.yaml
pygeofetch pipeline schedule weekly-sentinel2.yaml
```

---

## 📋 Documentation

Comprehensive documentation is available at **https://appiahkubis14.github.io/pygeofetch-docs/**, including:

- Full CLI reference
- Provider authentication guides
- Pipeline configuration reference
- Post-processing action catalogue
- Contributing guide

---

## 🤝 Contributing

Contributions of all kinds are welcome. See [CONTRIBUTING.md](https://github.com/appiahkubis14/pygeofetch?tab=contributing-ov-file) for full guidelines.

Good first issues include implementing stub providers to full API integrations, improving test coverage, and adding new post-processing actions.

<!-- ```bash
git clone https://github.com/appiahkubis14/pygeofetch
cd pygeofetch
pip install -e ".[dev,all]"
pytest tests/ -v
``` -->

---

## 📄 License

PyGeoFetch is free and open source software, licensed under the [MIT License](LICENSE).
