Metadata-Version: 2.4
Name: pygeovision
Version: 1.0.0
Summary: World-class Geospatial AI Platform — pygeofetch + geoai + PyGeoVision
Author: PyGeoVision Contributors
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           Copyright 2024 PyGeoVision Contributors
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
        
Project-URL: Homepage, https://github.com/pygeovision/pygeovision
Project-URL: Documentation, https://docs.pygeovision.ai
Project-URL: Repository, https://github.com/pygeovision/pygeovision
Keywords: geospatial,satellite,remote-sensing,deep-learning,computer-vision,GIS,earth-observation,AI,pygeofetch,geoai,sentinel,landsat
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: GIS
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pygeofetch>=0.1.0
Requires-Dist: pystac>=1.9
Requires-Dist: pystac-client>=0.7
Requires-Dist: planetary-computer>=1.0
Requires-Dist: httpx>=0.27
Requires-Dist: shapely>=2.0
Requires-Dist: pyproj>=3.6
Requires-Dist: boto3>=1.34
Requires-Dist: tenacity>=8.2
Requires-Dist: pydantic>=2.0
Requires-Dist: click>=8.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.28
Requires-Dist: numpy>=1.24
Requires-Dist: rich>=13.0
Requires-Dist: tqdm>=4.66
Provides-Extra: geoai
Requires-Dist: geoai-py>=0.39.0; extra == "geoai"
Provides-Extra: extra
Requires-Dist: geoai-py[extra]; extra == "extra"
Provides-Extra: geo
Requires-Dist: rasterio>=1.3; extra == "geo"
Requires-Dist: geopandas>=1.0; extra == "geo"
Requires-Dist: rioxarray>=0.15; extra == "geo"
Requires-Dist: pyogrio>=0.7; extra == "geo"
Provides-Extra: ai
Requires-Dist: torch>=2.0; extra == "ai"
Requires-Dist: torchvision>=0.15; extra == "ai"
Requires-Dist: timm>=0.9; extra == "ai"
Requires-Dist: transformers>=4.30; extra == "ai"
Requires-Dist: segmentation-models-pytorch>=0.3; extra == "ai"
Requires-Dist: albumentations>=1.3; extra == "ai"
Requires-Dist: scikit-learn>=1.3; extra == "ai"
Requires-Dist: Pillow>=9.0; extra == "ai"
Requires-Dist: rasterio>=1.3; extra == "ai"
Requires-Dist: pyproj>=3.5; extra == "ai"
Requires-Dist: shapely>=2.0; extra == "ai"
Provides-Extra: labelers
Requires-Dist: pygeovision[geo]; extra == "labelers"
Requires-Dist: requests>=2.28; extra == "labelers"
Provides-Extra: foundation
Requires-Dist: pygeovision[ai]; extra == "foundation"
Requires-Dist: open-clip-torch>=2.20; extra == "foundation"
Requires-Dist: huggingface-hub>=0.18; extra == "foundation"
Provides-Extra: mlflow
Requires-Dist: mlflow>=2.8; extra == "mlflow"
Provides-Extra: wandb
Requires-Dist: wandb>=0.16; extra == "wandb"
Provides-Extra: labelstudio
Requires-Dist: label-studio-sdk>=0.8; extra == "labelstudio"
Provides-Extra: onnx
Requires-Dist: onnx>=1.14; extra == "onnx"
Requires-Dist: onnxruntime>=1.16; extra == "onnx"
Provides-Extra: all
Requires-Dist: pygeovision[extra,foundation,geo,geoai,labelstudio,mlflow,onnx,wandb]; extra == "all"
Provides-Extra: dev
Requires-Dist: pygeovision[all]; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: mypy>=1.5; extra == "dev"
Requires-Dist: pre-commit>=3.0; extra == "dev"
Requires-Dist: ipython>=8.0; extra == "dev"
Requires-Dist: jupyter>=1.0; extra == "dev"
Requires-Dist: matplotlib>=3.7; extra == "dev"
Requires-Dist: responses>=0.25; extra == "dev"
Dynamic: license-file

<div align="center">

<img src="https://img.shields.io/badge/PyGeoVision-1.0.0-0d1117?style=for-the-badge&labelColor=0d1117&color=2563eb" alt="version"/>

# PyGeoVision

### World-Class Geospatial AI Platform

**The definitive Python framework for satellite data acquisition and geospatial AI —  
unifying [PyGeoFetch](https://github.com/appiahkubis14/PyGeoFetch) (22+ providers) and [GeoAI](https://opengeoai.org) (full AI stack) in one coherent API.**

---

[![Python](https://img.shields.io/badge/Python-3.10%20|%203.11%20|%203.12-3776ab?style=flat-square&logo=python&logoColor=white)](https://pypi.org/project/pygeovision/)
[![PyPI](https://img.shields.io/badge/PyPI-v1.0.0-2563eb?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/pygeovision/)
[![License](https://img.shields.io/badge/License-Apache_2.0-green?style=flat-square)](LICENSE)
[![Tests](https://img.shields.io/badge/Tests-208_passing-22c55e?style=flat-square&logo=pytest&logoColor=white)](#testing)
[![PyGeoFetch](https://img.shields.io/badge/PyGeoFetch-22_providers-f59e0b?style=flat-square)](https://github.com/appiahkubis14/PyGeoFetch)
[![GeoAI](https://img.shields.io/badge/GeoAI-24_subsystems-a855f7?style=flat-square)](https://opengeoai.org)

</div>

---

## What is PyGeoVision?

PyGeoVision is a **production-ready geospatial AI platform** that bridges two world-class open-source packages:

| Layer | Package | Responsibility |
|-------|---------|---------------|
| 🛰️ **Data** | [PyGeoFetch](https://github.com/appiahkubis14/PyGeoFetch) | Search & download satellite data from 22+ providers (Sentinel, Landsat, Planet, Maxar, USGS, Copernicus and more) with auth, caching, parallel downloads, post-processing, and YAML pipeline orchestration |
| 🤖 **AI** | [GeoAI](https://opengeoai.org) | Full AI stack: segmentation, detection, classification, change detection, SAM, foundation models (Prithvi, DINOv3), embeddings, cloud masking, super-resolution, ONNX export |
| 🔗 **Bridge** | **PyGeoVision** | Unified API, 10 end-to-end pipelines, CLI, experiment tracking, distributed training, automated labeling |

> **Design principle:** PyGeoVision never reimplements PyGeoFetch or GeoAI. All data operations delegate to PyGeoFetch. All AI operations delegate to GeoAI. PyGeoVision is the integration layer that makes them work seamlessly together.

---

## Architecture Overview

```mermaid
graph TB
    subgraph PGV["🔗 PyGeoVision Platform"]
        direction TB
        
        subgraph CLI["⌨️  CLI  (pygeovision)"]
            C1[data auth/search/download]
            C2[ai segment/detect/train]
            C3[pipeline building_footprints ...]
            C4[models list/info]
        end

        subgraph API["🐍  Python API"]
            A1[PyGeoVision]
            A2[PyGeoVisionClient]
        end

        subgraph PF["🛰️  Data Layer — PyGeoFetch"]
            direction LR
            PF1[SatelliteFetcher\nCLI subprocess + pystac_client fallback]
            PF2[DataPipeline\nYAML pipeline builder]
            PF3[Providers Registry\n22 providers]
        end

        subgraph GA["🤖  AI Layer — GeoAI"]
            direction LR
            GA1[GeoAIEngine\n24 subsystem proxies]
            GA2[segment · detect · classify\nchange · train · infer · embed]
            GA3[sam · prithvi · cloud · sr\nonnx · canopy · dinov3 · tessera]
        end

        subgraph PIPE["⚙️  End-to-End Pipelines  (10)"]
            P1[building_footprints\nchange_detection\nland_cover]
            P2[water_bodies\nsolar_detection\ncrop_monitoring]
            P3[disaster_assessment\ndeforestation\nurban_growth\ncarbon_estimation]
        end

        subgraph OWN["🧩  PyGeoVision Own AI Stack"]
            direction LR
            O1[14 Model Architectures\nUNet · SegFormer · FCOS · ViT]
            O2[GeoTrainer\n6 losses · distributed]
            O3[TiledInference\nPostProcessor · Ensemble]
            O4[7 Labelers\nOSM · SAM · ESA · MS · Google]
            O5[ExperimentTracker\nDriftDetector]
        end
    end

    subgraph EXTERNAL["☁️  External Services"]
        E1["🛰️ Planetary Computer\nAWS Earth · Element 84\nCopernicus · USGS · NASA"]
        E2["🔐 Planet Labs\nMaxar · Airbus · Sentinel Hub\nASF · OpenTopography · GEE"]
        E3["🤗 HuggingFace Hub\nNASA Prithvi · DINOv3\ntimm · RF-DETR"]
    end

    C1 & C2 & C3 & C4 --> A1
    A1 --> PF1 & GA1 & PIPE
    A2 --> A1
    PF1 --> PF2 & PF3
    GA1 --> GA2 & GA3
    PIPE --> PF1
    PIPE --> GA1
    A1 --> OWN
    PF1 -->|"pygeofetch search run\npygeofetch download run\npygeofetch pipeline run"| E1 & E2
    GA1 -->|geoai API calls| E3

    style PGV fill:#0d1117,stroke:#2563eb,stroke-width:2px,color:#e2e8f0
    style PF fill:#1e293b,stroke:#f59e0b,stroke-width:1px,color:#e2e8f0
    style GA fill:#1e293b,stroke:#a855f7,stroke-width:1px,color:#e2e8f0
    style PIPE fill:#1e293b,stroke:#22c55e,stroke-width:1px,color:#e2e8f0
    style OWN fill:#1e293b,stroke:#64748b,stroke-width:1px,color:#e2e8f0
    style CLI fill:#0f172a,stroke:#64748b,stroke-width:1px,color:#e2e8f0
    style API fill:#0f172a,stroke:#64748b,stroke-width:1px,color:#e2e8f0
    style EXTERNAL fill:#0f172a,stroke:#475569,stroke-width:1px,color:#e2e8f0
```

---

## Data Flow: Search → Download → AI → Output

```mermaid
sequenceDiagram
    participant User
    participant PGV as PyGeoVision
    participant PGF as PyGeoFetch CLI
    participant STAC as STAC Provider<br/>(Planetary Computer etc.)
    participant GA as GeoAI
    participant HF as HuggingFace Hub

    User->>PGV: client.search(bbox, date_range, providers, cloud_cover_max)
    PGV->>PGF: pygeofetch search run --bbox ... --providers ...
    PGF->>STAC: STAC search API (22 providers)
    STAC-->>PGF: GeoJSON FeatureCollection (100 scenes)
    PGF-->>PGV: results.geojson
    PGV-->>User: List[SearchResult]

    User->>PGV: client.download(results[:5], post_process=["unzip","cog"])
    PGV->>PGF: pygeofetch download run --from-search ...
    PGF->>STAC: Parallel HTTP downloads (4 workers)
    STAC-->>PGF: .SAFE / .tif / .zip files
    PGF->>PGF: Post-process: unzip → reproject → compress → COG
    PGF-->>PGV: DownloadResult (path, bytes, duration)
    PGV-->>User: List[DownloadResult]

    User->>PGV: client.geoai.segment.buildings(path, output_vector="out.geojson")
    PGV->>GA: geoai.BuildingFootprintExtractor().predict(...)
    GA->>HF: Download pretrained model weights
    HF-->>GA: checkpoint.pth
    GA->>GA: Tiled inference (512×512, Gaussian blend)
    GA->>GA: Vectorize → smooth → regularize polygons
    GA-->>PGV: GeoJSON building footprints
    PGV-->>User: output_path, stats
```

---

## Installation

```bash
# Core — PyGeoVision + PyGeoFetch integration
pip install pygeovision

# + Full GeoAI stack (PyTorch, transformers, SMP, leafmap, timm, torchgeo)
pip install "pygeovision[geoai]"

# + Raster/vector processing (rasterio, geopandas, rioxarray)
pip install "pygeovision[geo]"

pip install "pygeovision[extra]"

# + Everything
pip install "pygeovision[all]"
```

**Requirements:** Python 3.10+ · PyGeoFetch (`pip install pygeofetch`) · GeoAI (`pip install geoai-py`)

---

## Quick Start

### Five-Minute Walkthrough

```python
import pygeovision as pgv

# ─── Initialise ─────────────────────────────────────────────────────────────
client = pgv.PyGeoVision()
print(client)  # PyGeoVision(v1.0.0, pygeofetch=✓, geoai=✓)

# ─── 1. Authenticate providers (stored securely in system keyring) ───────────
# Open access — no credentials needed:
#   planetary_computer, aws_earth, element84, noaa_big_data,
#   esa_scihub, jaxa_earth, isro_bhuvan, inpe_cbers, digitalglobe

# Credentialled providers:
client.add_credentials("usgs", username="user", password="pass")
client.add_credentials("planet", api_key="PL-xxxx")
client.add_credentials("copernicus", client_id="id", client_secret="secret")

# ─── 2. Search satellite data via PyGeoFetch ────────────────────────────────
results = client.search(
    bbox=(-0.15, 51.47, -0.10, 51.52),          # London, WGS84
    date_range=("2024-06-01", "2024-06-30"),
    providers=["planetary_computer", "copernicus", "usgs"],
    collections=["sentinel-2-l2a"],              # or use satellite="sentinel-2"
    cloud_cover_max=10,
    max_results=50,
    sort_by="cloud_cover",
)
print(f"Found {len(results)} scenes")
for r in results[:3]:
    print(r)
# [planetary_computer] Sentinel-2C | 2024-06-03 | cloud=0% score=0.99 | S2C_MSIL2A_...

# ─── 3. Download with post-processing via PyGeoFetch ────────────────────────
downloads = client.download(
    results[:3],
    output_dir="./sentinel2/",
    parallel=4,
    verify_checksum=True,
    resume=True,
    post_process=["unzip", "reproject:EPSG:4326", "compress:lzw", "cog"],
)
for d in downloads:
    print(d)  # ✓ scene-id (245.3 MB, 12.1s) → ./sentinel2/scene.tif

# ─── 4. AI: Segment buildings with GeoAI ────────────────────────────────────
client.geoai.segment.buildings(
    downloads[0].path,
    output_path="buildings.tif",
    output_vector="buildings.geojson",
    confidence_threshold=0.5,
)

# ─── 5. AI: Detect changes between two dates ────────────────────────────────
client.geoai.change.detect(
    "scene_2020.tif",
    "scene_2024.tif",
    output_path="changes.tif",
)

# ─── 6. AI: Train a custom segmentation model ────────────────────────────────
client.geoai.train.segmentation(
    "./building_chips/",
    "building_model.pth",
    num_classes=2,
    epochs=100,
    backbone="efficientnet-b4",
    batch_size=16,
)

# ─── 7. End-to-end pipeline (PyGeoFetch data + GeoAI inference) ─────────────
result = client.pipeline(
    "building_footprints",
    bbox=(-0.15, 51.47, -0.10, 51.52),
    date="2024-06",
    output_dir="./results/",
)
print(result.output_path)   # ./results/building_footprints/prediction.tif
print(result.stats)         # {"buildings_detected": 1847, "coverage_pct": 0.312}
```

---

## PyGeoFetch Integration — Satellite Data (22 Providers)

PyGeoVision uses **PyGeoFetch** as its exclusive data backend. The `pygeofetch` CLI is called via subprocess for the full 22-provider experience, with `pystac_client` + `planetary_computer` as a Python fallback for STAC providers.

### Provider Registry

| Provider ID | Name | Auth | Key Satellites | SAR | Sub-m | STAC |
|-------------|------|------|---------------|:---:|:-----:|:----:|
| `planetary_computer` | Microsoft Planetary Computer | 🌐 Open | Sentinel-1/2, Landsat, MODIS, NAIP | ✓ | | ✓ |
| `aws_earth` | AWS Earth Open Data | 🌐 Open | Sentinel-2 COGs, Landsat, NAIP | | | ✓ |
| `element84` | Element 84 Earth Search | 🌐 Open | Sentinel-2 COGs, Landsat Col 2 | | | ✓ |
| `noaa_big_data` | NOAA Big Data | 🌐 Open | GOES-16/17/18, NEXRAD | | | |
| `esa_scihub` | ESA SciHub Mirror | 🌐 Open | Copernicus public mirrors | ✓ | | |
| `jaxa_earth` | JAXA ALOS World | 🌐 Open | ALOS 30m DSM, PALSAR | ✓ | | |
| `isro_bhuvan` | ISRO Bhuvan | 🌐 Open | ResourceSat, Cartosat, Oceansat | | | |
| `inpe_cbers` | INPE CBERS | 🌐 Open | CBERS-4/4A | | | |
| `digitalglobe` | DigitalGlobe Open Data | 🌐 Open | Disaster response VHR | | ✓ | |
| `geoserver_generic` | GeoServer Generic OGC | 🌐 Open | Any WMS/WCS/WFS service | | | |
| `usgs` | USGS Earth Explorer | 🔐 User/Pass | Landsat 1-9, ASTER, MODIS | | | |
| `copernicus` | Copernicus CDSE | 🔐 OAuth2 | Sentinel-1/2/3/5P | ✓ | | ✓ |
| `nasa_earthdata` | NASA Earthdata CMR | 🔐 OAuth2 | MODIS, VIIRS, ICESat-2, GEDI | | | ✓ |
| `nasa_earthdata_cloud` | NASA Earthdata Cloud | 🔐 OAuth2+S3 | Cloud-hosted NASA data | | | ✓ |
| `opentopography` | OpenTopography | 🔐 API Key | SRTM, Copernicus DEM 30/90m, LiDAR | | | |
| `planet` | Planet Labs | 🔐 API Key | PlanetScope 3–5m, SkySat 0.5m | | ✓ | ✓ |
| `sentinel_hub` | Sentinel Hub | 🔐 OAuth2 | All Sentinels, Landsat, MODIS | ✓ | | |
| `maxar_gbdx` | Maxar GBDX | 🔐 Token | WorldView 1–4, GeoEye-1 (30cm) | | ✓ | |
| `airbus_oneatlas` | Airbus OneAtlas | 🔐 API Key | Pléiades 50cm, SPOT 6/7 1.5m | | ✓ | ✓ |
| `alaska_satellite_facility` | Alaska Satellite Facility | 🔐 Earthdata | Sentinel-1, ALOS PALSAR | ✓ | | |
| `google_earth_engine` | Google Earth Engine | 🔐 Service Acct | Multi-petabyte global catalog | ✓ | | |
| `terrabotics` | TerraBotics | 🔐 API Key | Archive + tasking | | ✓ | |

### Authentication

```python
# User/password (USGS, NASA Earthdata)
client.add_credentials("usgs", username="user", password="pass")
client.add_credentials("nasa_earthdata", username="user", password="pass")

# API key (Planet, OpenTopography, Airbus)
client.add_credentials("planet", api_key="PL-xxxx")
client.add_credentials("opentopography", api_key="OT-xxxx")
client.add_credentials("airbus_oneatlas", api_key="AB-xxxx")

# OAuth2 (Copernicus, Sentinel Hub, Maxar)
client.add_credentials("copernicus", client_id="id", client_secret="secret")
client.add_credentials("sentinel_hub", client_id="id", client_secret="secret")

# Chaining
client \
    .add_credentials("usgs", username="u", password="p") \
    .add_credentials("planet", api_key="PL-xxxx") \
    .add_credentials("copernicus", client_id="id", client_secret="secret")

# Test connectivity
client.test_provider("planetary_computer")  # True

# List stored credentials
client.data.list_credentials()  # ['usgs', 'planet', 'copernicus']
```

### Search API

```python
# By provider
results = client.search(
    bbox=(-74.1, 40.6, -73.7, 40.9),
    date_range=("2024-01-01", "2024-06-01"),
    providers=["planetary_computer", "copernicus", "usgs"],
    cloud_cover_max=15,
    max_results=100,
    sort_by="cloud_cover",          # datetime | cloud_cover | score | satellite
    sort_order="asc",
)

# By satellite shortcut (auto-selects providers)
results = client.search(
    bbox=..., date_range=..., satellite="sentinel-2",    # or "landsat", "planet", "dem"
)

# By STAC collection
results = client.search(
    bbox=..., date_range=...,
    collections=["sentinel-2-l2a", "landsat-c2-l2"],
)

# Advanced: CQL2 filter expression
results = client.search(
    bbox=..., date_range=...,
    cql2_filter="eo:cloud_cover < 5 AND platform = 'sentinel-2a'",
)

# SearchResult properties
r = results[0]
r.id              # 'S2C_MSIL2A_20240603T153811_R001'
r.provider        # 'planetary_computer'
r.satellite       # 'Sentinel-2C'
r.date            # '2024-06-03'
r.cloud_cover     # 0.0
r.bbox            # (-0.15, 51.47, -0.10, 51.52)
r.score           # 0.99
r.collection      # 'sentinel-2-l2a'
r.resolution_m    # 10.0
r.is_sar          # False
r.to_dict()       # JSON-serializable dict
r.to_stac_item()  # pystac Item object
```

### Download API

```python
downloads = client.download(
    results[:5],
    output_dir="./data/",
    parallel=4,                # concurrent downloads
    verify_checksum=True,      # SHA256 verification
    resume=True,               # resume interrupted downloads
    retry_attempts=5,          # exponential backoff
    bandwidth_limit_mb=20.0,   # throttle in MB/s
    on_failure="skip",         # skip | abort | retry
    post_process=[
        "unzip",                       # extract ZIP/TAR archives
        "reproject:EPSG:4326",         # reproject to target CRS
        "compress:lzw",                # apply compression (lzw | deflate | zstd)
        "ndvi",                        # compute NDVI band
        "ndwi",                        # compute NDWI band
        "resample:10",                 # resample to N metres
        "cog",                         # Cloud Optimized GeoTIFF
    ],
)

# DownloadResult properties
d = downloads[0]
d.success           # True
d.path              # Path('./data/S2C_MSIL2A_20240603_visual.tif')
d.size_mb           # 245.3
d.duration_seconds  # 12.1
d.checksum_verified # True
```

### Pipeline Builder (YAML Orchestration)

```python
# Build a recurring pipeline programmatically
pipeline = (
    client.create_pipeline("weekly-sentinel2-london", description="Weekly S2 NDVI")
    .search(
        providers=["planetary_computer", "copernicus"],
        bbox=(-0.15, 51.47, -0.10, 51.52),
        date_range="last_7_days",        # last_7_days | last_30_days | this_month
        cloud_cover="0-10",
        max_results=20,
    )
    .filter("data.cloud_cover < 5")
    .download(
        parallel=4,
        output="./raw/",
        verify_checksum=True,
        post_process=["unzip", "reproject:EPSG:4326", "cog"],
    )
    .export(
        format="cloud_optimized_geotiff",
        destination="s3://my-bucket/london/",
    )
    .set_schedule("0 6 * * 1")           # Every Monday 06:00 UTC
)

# Save and run
pipeline.save("weekly-sentinel2.yaml")
pipeline.run()                           # delegates to: pygeofetch pipeline run

# Or run a YAML file directly
client.run_pipeline_yaml("weekly-sentinel2.yaml")
client.run_pipeline_yaml("weekly-sentinel2.yaml", step="download")

# Schedule, list, inspect
client.data.schedule_pipeline("weekly-sentinel2.yaml", cron="0 6 * * 1")
client.data.list_scheduled_pipelines()
client.data.pipeline_history(limit=20)
```

**Generated YAML:**
```yaml
name: weekly-sentinel2-london
description: Weekly S2 NDVI
schedule: 0 6 * * 1
steps:
  - search:
      providers: [planetary_computer, copernicus]
      bbox: "-0.15,51.47,-0.10,51.52"
      date_range: last_7_days
      cloud_cover: 0-10
      max_results: 20
  - filter:
      expression: data.cloud_cover < 5
  - download:
      parallel: 4
      output: ./raw/
      verify_checksum: true
      post_process: unzip,reproject:EPSG:4326,cog
  - export:
      format: cloud_optimized_geotiff
      destination: s3://my-bucket/london/
```

---

## GeoAI Integration — 24 AI Subsystems

PyGeoVision exposes GeoAI's complete API through `client.geoai.*`. All imports are lazy — GeoAI is only loaded when first accessed.

```python
ga = client.geoai         # GeoAIEngine proxy
ga.version                # '0.39.2'
ga.is_available           # True
ga.raw()                  # raw geoai module for direct access
```

### Segmentation (`client.geoai.segment`)

```python
# Building footprints — geoai.BuildingFootprintExtractor
client.geoai.segment.buildings(
    "sentinel2.tif", output_path="buildings.tif", output_vector="buildings.geojson",
    confidence_threshold=0.5, chip_size=512, overlap=64,
)

# Solar panel detection — geoai.SolarPanelDetector
client.geoai.segment.solar_panels("aerial.tif", output_vector="solar.geojson")

# Agricultural field delineation — geoai.AgricultureFieldDelineator
client.geoai.segment.agriculture_fields("sentinel2.tif", output_vector="fields.geojson")

# Water body segmentation — geoai.segment_water
client.geoai.segment.water("sentinel2.tif", output_path="water.tif", band_order="sentinel2")

# Custom model segmentation — geoai.semantic_segmentation
client.geoai.segment.custom("scene.tif", "model.pth", "pred.tif", num_classes=5)

# HuggingFace Hub model — geoai.image_segmentation
client.geoai.segment.with_hf_model("scene.tif", "openmmlab/upernet-swin-base")

# SAM auto-segmentation — geoai.mask_generation
client.geoai.segment.with_sam("aerial.tif", output_path="masks.tif")

# timm-backbone model — geoai.timm_semantic_segmentation
client.geoai.segment.timm_model("scene.tif", "timm_model.pth")

# HuggingFace Hub timm model — geoai.timm_segmentation_from_hub
client.geoai.segment.from_hub("scene.tif", "giswqs/building-footprint-usa")
```

### Detection (`client.geoai.detect`)

```python
# Vehicle detection — geoai.CarDetector
client.geoai.detect.cars("aerial.tif", output_path="cars.geojson")

# Ship detection — geoai.ShipDetector
client.geoai.detect.ships("port_scene.tif", output_path="ships.geojson")

# Parking spot detection — geoai.ParkingSplotDetector
client.geoai.detect.parking("car_park.tif", output_path="spots.geojson")

# Natural language grounded detection — geoai.GroundedSAM
client.geoai.detect.grounded("aerial.tif", "swimming pools", output_path="pools.geojson")
client.geoai.detect.grounded("aerial.tif", "solar panels on rooftops")

# RF-DETR real-time detection — geoai.rfdetr_detect
client.geoai.detect.rfdetr("scene.tif", output_path="detections.geojson")

# Multi-class object detection — geoai.multiclass_detection
client.geoai.detect.multiclass("scene.tif", "nwpu_model.pth", output_path="det.geojson")

# Instance segmentation — geoai.instance_segmentation
client.geoai.detect.instance_segmentation("scene.tif", "maskrcnn.pth")
```

### Classification (`client.geoai.classify`)

```python
# Scene classification — geoai.classify_image
result = client.geoai.classify.classify("tile.tif", "classifier.pth")

# CLIP zero-shot land cover — geoai.CLIPVectorClassifier
client.geoai.classify.land_cover(
    "sentinel2.tif",
    classes=["forest", "water", "urban", "agriculture", "bare soil"],
)

# Batch classification — geoai.classify_images
client.geoai.classify.batch("./image_chips/", "classifier.pth")

# Train a classifier — geoai.train_classifier
client.geoai.classify.train("./dataset/", "classifier.pth", num_classes=8)
```

### Change Detection (`client.geoai.change`)

```python
# ChangeSTAR bi-temporal change detection — geoai.changestar_detect
client.geoai.change.detect(
    "scene_2020.tif",
    "scene_2024.tif",
    output_path="changes.tif",
)

# List available ChangeSTAR model variants
client.geoai.change.list_models()   # ['changestar-v1', 'changestar-v2', ...]
```

### Training (`client.geoai.train`)

```python
# Semantic segmentation — geoai.train_segmentation_model
client.geoai.train.segmentation(
    "./building_chips/", "building_model.pth",
    val_data="./val_chips/", num_classes=2,
    epochs=100, batch_size=16, backbone="efficientnet-b4",
)

# Land cover with specialist losses — geoai.train_segmentation_landcover
client.geoai.train.segmentation_landcover(
    "./lc_chips/", "landcover.pth",
    num_classes=11, loss_fn="unified_focal",    # dice | focal | tversky | unified_focal
)

# Multi-class object detection — geoai.train_multiclass_detector
client.geoai.train.detection("./nwpu_chips/", "detector.pth", num_classes=10)

# Instance segmentation — geoai.train_instance_segmentation_model
client.geoai.train.instance_segmentation("./coco_chips/", "maskrcnn.pth")

# timm-backbone (1000+ backbones) — geoai.train_timm_segmentation_model
client.geoai.train.timm_segmentation(
    "./chips/", "timm_seg.pth", backbone="convnext_base",
)

# Pixel regression (NDVI, height, biomass) — geoai.train_pixel_regressor
client.geoai.train.pixel_regressor("./regression_chips/", "regressor.pth")

# RF-DETR training — geoai.rfdetr_train
client.geoai.train.rfdetr("./coco_data/", "rfdetr.pth")

# Export training chips — geoai.export_training_data
client.geoai.train.generate_chips(
    "sentinel2.tif", "labels.tif", "./chips/", chip_size=256, overlap=32,
)
```

### Foundation Models

```python
# NASA Prithvi (HLS multispectral) — geoai.load_prithvi_model
client.geoai.prithvi.list_models()    # ['prithvi-eo-1.0-100M', 'prithvi-eo-2.0-300M']
model = client.geoai.prithvi.load("prithvi-eo-1.0-100M")
client.geoai.prithvi.infer("hls_tile.tif", model, output_path="pred.tif")

# Segment Anything Model (SAM) — geoai.mask_generation / GroundedSAM
client.geoai.sam.generate_masks("aerial.tif", output_path="masks.tif")
client.geoai.sam.grounded("aerial.tif", "solar panels on rooftops")

# DINOv3 — geoai.analyze_image_patches / train_dinov3_segmentation
client.geoai.dinov3.analyze("scene.tif")
client.geoai.dinov3.similarity_map("scene.tif", query_point=(256, 256))
client.geoai.dinov3.finetune("./chips/", "dino_seg.pth", num_classes=5)
client.geoai.dinov3.segment("scene.tif", "dino_seg.pth", output_path="pred.tif")
```

### Embeddings (`client.geoai.embed`)

```python
# Patch embeddings — geoai.extract_patch_embeddings
embeddings = client.geoai.embed.patch("sentinel2.tif", chip_size=64)   # (N, 512) array

# Pixel embeddings — geoai.extract_pixel_embeddings
pix_emb = client.geoai.embed.pixel("sentinel2.tif")

# Cluster embeddings — geoai.cluster_embeddings
client.geoai.embed.cluster(embeddings, n_clusters=10)

# Cosine similarity — geoai.embedding_similarity
score = client.geoai.embed.similarity(emb_a, emb_b)    # 0.85

# Visualize (UMAP / t-SNE) — geoai.visualize_embeddings
client.geoai.embed.visualize(embeddings)

# Tessera satellite embedding datasets — geoai.tessera_*
client.geoai.tessera.available_years(bbox=(-74.1, 40.6, -73.7, 40.9))
client.geoai.tessera.coverage(bbox=...)
client.geoai.tessera.download(bbox=..., output_dir="./tessera/")
```

### Cloud, SR, ONNX, Canopy, Captions

```python
# Cloud masking — geoai.predict_cloud_mask_from_raster
client.geoai.cloud.predict("sentinel2.tif", output_path="cloud_mask.tif")
client.geoai.cloud.batch("./scenes/", "./cloud_masks/")
stats = client.geoai.cloud.statistics("cloud_mask.tif")  # {"cloud_cover": 0.15}

# Super-resolution (ESRGAN) — geoai.super_resolution
client.geoai.sr.enhance("landsat.tif", output_path="enhanced.tif", scale_factor=4)

# ONNX export and inference — geoai.export_to_onnx / onnx_semantic_segmentation
client.geoai.onnx.export(model, "model.onnx", input_shape=(1, 4, 512, 512))
client.geoai.onnx.segmentation("scene.tif", "model.onnx", output_path="pred.tif")

# Canopy height estimation — geoai.canopy_height_estimation
client.geoai.canopy.estimate("sentinel2.tif", output_path="canopy_height.tif")

# Moondream VLM captioning — geoai.moondream_*
caption = client.geoai.caption.caption("tile.tif")
answer = client.geoai.caption.query("tile.tif", "Are there buildings?")
client.geoai.caption.detect("tile.tif", "cars")

# Water body segmentation with sensor presets — geoai.segment_water
client.geoai.water.segment("s2.tif", band_order="sentinel2")  # or "naip", "landsat"

# RF-DETR detection — geoai.rfdetr_detect
client.geoai.rfdetr.detect("scene.tif", output_path="det.geojson")
client.geoai.rfdetr.list_models()
client.geoai.rfdetr.from_hub("scene.tif", "rfdetr-base")

# Interactive visualization — geoai.Map / view_raster / view_vector
m = client.geoai.map.leafmap()
client.geoai.map.view_raster("scene.tif")
client.geoai.map.view_vector("buildings.geojson")
```

### Utilities (`client.geoai.utils`)

```python
client.geoai.utils.raster_info("scene.tif")          # width, height, bands, CRS
client.geoai.utils.raster_to_vector("pred.tif", "polygons.geojson")
client.geoai.utils.vector_to_raster("polys.geojson", "ref.tif", "raster.tif")
client.geoai.utils.clip_by_bbox("scene.tif", bbox, "clipped.tif")
client.geoai.utils.mosaic(["tile1.tif", "tile2.tif"], "mosaic.tif")
client.geoai.utils.stack_bands(["B04.tif", "B08.tif", "B11.tif"], "stack.tif")
client.geoai.utils.smooth_vector("buildings.geojson", "smooth.geojson")
client.geoai.utils.regularize("buildings.geojson", "regular.geojson")
iou = client.geoai.utils.iou(pred_mask, gt_mask)
metrics = client.geoai.utils.segmentation_metrics(pred, target)  # miou, f1, acc
device = client.geoai.utils.get_device()              # 'cuda' | 'mps' | 'cpu'
client.geoai.utils.empty_cache()
```

---

## End-to-End Pipelines (10)

Each pipeline orchestrates the full workflow: **PyGeoFetch search → download → GeoAI inference → vector output**.

```mermaid
graph LR
    subgraph "End-to-End Pipeline Flow"
        S[🛰️ PyGeoFetch\nSearch] --> D[📥 PyGeoFetch\nDownload]
        D --> PP[⚙️ Post-Process\nunzip · reproject · cog]
        PP --> AI[🤖 GeoAI\nInference]
        AI --> V[📊 Vector Output\nGeoJSON · GeoParquet]
        V --> STATS[📈 Statistics\n& Metadata]
    end
```

| Pipeline | Data Source | AI Model | Output |
|----------|------------|----------|--------|
| `building_footprints` | Sentinel-2 / NAIP (PC) | GeoAI BuildingFootprintExtractor | GeoJSON polygons |
| `change_detection` | Bi-temporal Sentinel-2 | GeoAI ChangeSTAR | Change mask GeoTIFF |
| `land_cover` | Sentinel-2 (PC / Copernicus) | ESA WorldCover / GeoAI SegFormer | Classification GeoTIFF |
| `water_bodies` | Sentinel-2 | GeoAI segment_water (NDWI) | Water polygon GeoJSON |
| `solar_detection` | NAIP / Sentinel-2 | GeoAI SolarPanelDetector | GeoJSON polygons |
| `crop_monitoring` | Sentinel-2 seasonal stack | GeoAI SegFormer-B2 | Crop type map |
| `disaster_assessment` | Post-event imagery | GeoAI Siamese-UNet | Damage assessment |
| `deforestation` | Bi-temporal Landsat/S2 | GeoAI ChangeFormer | Forest loss mask |
| `urban_growth` | Bi-temporal Landsat | GeoAI Siamese-UNet | Urban expansion map |
| `carbon_estimation` | Sentinel-2 NDVI | NDVI → AGB formula | Carbon stock estimate |

```python
# Building footprints
result = client.pipeline("building_footprints",
    bbox=(-0.15, 51.47, -0.10, 51.52), date="2024-06")

# Bi-temporal change detection
result = client.pipeline("change_detection",
    bbox=(-74.1, 40.6, -73.7, 40.9),
    date_before="2020-01", date_after="2024-01")

# Land cover with ESA WorldCover source
result = client.pipeline("land_cover",
    bbox=..., date="2024-06", source="worldcover")

# Solar panel mapping
result = client.pipeline("solar_detection",
    bbox=..., date="2024-06")

# All pipelines return PipelineResult
result.success        # True
result.output_path    # Path('./results/building_footprints/prediction.tif')
result.stats          # {"buildings_detected": 1847, "coverage_pct": 0.312}
result.metadata       # {"provider": "planetary_computer", "scene_id": "..."}
```

---

## PyGeoVision's Own AI Stack

In addition to the GeoAI integration, PyGeoVision ships its own production AI stack for training custom models on geospatial data.

### Model Registry (14 Architectures)

| Model | Task | Architecture | Pretrained |
|-------|------|-------------|:---------:|
| `unet_resnet50` | Segmentation | U-Net + ResNet-50 | ✓ |
| `unet_efficientnet_b4` | Segmentation | U-Net + EfficientNet-B4 | ✓ |
| `segformer_b2` | Segmentation | SegFormer-B2 | ✓ |
| `segformer_b5` | Segmentation | SegFormer-B5 | ✓ |
| `deeplabv3plus_resnet101` | Segmentation | DeepLabV3+ | ✓ |
| `fcos_resnet50` | Detection | FCOS (anchor-free) | ✓ |
| `retinanet_resnet50` | Detection | RetinaNet | ✓ |
| `resnet50_cls` | Classification | ResNet-50 | ✓ |
| `efficientnet_b3_cls` | Classification | EfficientNet-B3 | ✓ |
| `vit_b16_cls` | Classification | ViT-B/16 | ✓ |
| `siamese_unet` | Change Detection | Siamese U-Net | |
| `changeformer` | Change Detection | ChangeFormer | ✓ |
| `esrgan_geo` | Super Resolution | ESRGAN-Geo | |
| `srcnn` | Super Resolution | SRCNN | |

```python
# Load from registry
from pygeovision.ai.models import ModelHub
hub = ModelHub()
model = hub.load("segformer_b2", num_classes=5, pretrained=True)

# List models
from pygeovision.ai.models.registry import registry
models = registry.list_models(task="segmentation", pretrained_only=True)
```

### GeoTrainer

```python
from pygeovision.ai.training.trainer import GeoTrainer

trainer = GeoTrainer(
    model=model,
    train_dataset=train_ds,
    val_dataset=val_ds,
    num_classes=5,
    max_epochs=100,
    learning_rate=1e-4,
    batch_size=16,
    loss_fn="dice",             # dice | focal | tversky | unified_focal | weighted_ce
    optimizer="adamw",
    scheduler="cosine",
    mixed_precision=True,
    gradient_accumulation_steps=4,
    callbacks=["early_stopping", "model_checkpoint", "rich_progress"],
    export_onnx=True,           # auto-export on training complete
)
result = trainer.fit()
```

**Supported losses:** `DiceLoss`, `FocalLoss`, `DiceFocalLoss`, `TverskyLoss`, `WeightedCrossEntropyLoss`, `ChangeDetectionLoss`

**Supported metrics:** `SegmentationMetrics` (mIoU, F1, accuracy, precision, recall), `ConfusionMatrix`, `BinaryMetrics`

### Automated Labeling (7 Labelers)

```python
# OpenStreetMap polygons
client.ai.label(tiles, labeler="osm", feature_type="building")

# Microsoft Building Footprints (global coverage)
client.ai.label(tiles, labeler="microsoft_buildings")

# Google Open Buildings
client.ai.label(tiles, labeler="google_buildings")

# ESA WorldCover (land cover)
client.ai.label(tiles, labeler="esa_worldcover")

# Google Dynamic World (near-real-time)
client.ai.label(tiles, labeler="dynamic_world")

# SAM auto-labeling
client.ai.label(tiles, labeler="sam")

# Foundation model labeling
client.ai.label(tiles, labeler="foundation", model_id="giswqs/building-footprint")
```

### TiledInference

```python
from pygeovision.ai.inference.tiled_inference import TiledInference

engine = TiledInference(
    model=model,
    tile_size=512,
    overlap=64,
    blend_mode="gaussian",      # gaussian | average
    batch_size=4,
    device="cuda",
)
prediction = engine.run("large_scene.tif", "prediction.tif", num_classes=5)
```

---

## Command-Line Interface

```bash
# ─── System ──────────────────────────────────────────────────────────────────
pygeovision status                          # Full status: PyGeoFetch + GeoAI + torch
pygeovision status --json                   # Machine-readable JSON
pygeovision doctor                          # Diagnose installation and connectivity

# ─── Authentication (via PyGeoFetch keyring) ─────────────────────────────────
pygeovision data auth add usgs --username USER --password PASS
pygeovision data auth add planet --api-key PL-xxxx
pygeovision data auth add copernicus --client-id ID --client-secret SECRET
pygeovision data auth list
pygeovision data auth test planetary_computer
pygeovision data auth remove usgs

# ─── Providers ───────────────────────────────────────────────────────────────
pygeovision data providers                  # List all 22 providers
pygeovision data providers --open-only      # Open-access only
pygeovision data providers --sar            # SAR-capable only
pygeovision data providers --sub-meter      # Sub-metre resolution only

# ─── Search ──────────────────────────────────────────────────────────────────
pygeovision data search \
    --bbox "-74.1,40.6,-73.7,40.9" \
    --date 2024-06 \
    --providers planetary_computer \
    --cloud-max 10 \
    --output results.geojson


# Direct pygeofetch — works
pygeofetch search run --bbox "-74.1,40.6,-73.7,40.9" --start-date 2024-01-01 --cloud-cover 0-15 --providers planetary_computer --output results.geojson

# Then in your pygeovision code, read the results file
pygeovision data search --bbox -74.1 40.6 -73.7 40.9 --start-date 2024-06-01 --end-date 2024-06-30 --providers planetary_computer --cloud-max 30 --output result_new.geojson > out.txt 2>&1

pygeofetch search run --bbox "-74.1,40.6,-73.7,40.9" --start-date 2024-01-01 --cloud-cover 0-15 --providers planetary_computer --output result_new.geojson

pygeovision data search --bbox ... --satellite sentinel-2 --format table
pygeovision data search --bbox ... --collections sentinel-2-l2a,landsat-c2-l2

# ─── Download ────────────────────────────────────────────────────────────────
pygeovision data download --from-search result_new.geojson --output ./data/ --parallel 4 --verify-checksum --post-process unzip,reproject:EPSG:4326,compress:lzw,cog




# def _run_cli(self, args, timeout=300):
#     cmd = self._build_cmd(args)
#     env = os.environ.copy()
#     env['PYTHONIOENCODING'] = 'utf-8'
#     env['PYTHONUTF8'] = '1'
    
#     return subprocess.run(
#         cmd,
#         capture_output=True,
#         text=True,
#         encoding='utf-8',
#         env=env,
#         timeout=timeout,
#         creationflags=subprocess.CREATE_NO_WINDOW if platform.system() == "Windows" else 0,
#     )



# ─── Pipeline ────────────────────────────────────────────────────────────────
pygeovision data pipeline run weekly-sentinel2.yaml
pygeovision data pipeline validate weekly-sentinel2.yaml
pygeovision data pipeline schedule weekly-sentinel2.yaml --cron "0 6 * * 1"
pygeovision data pipeline list
pygeovision data pipeline history

# ─── Cache ───────────────────────────────────────────────────────────────────
pygeovision data cache stats
pygeovision data cache clear --older-than 7d
pygeovision data cache clear --provider planetary_computer

# ─── AI: Segmentation ────────────────────────────────────────────────────────
pygeovision ai segment buildings \
    --input sentinel2.tif \
    --output buildings.tif \
    --vector buildings.geojson \
    --confidence 0.5

pygeovision ai segment solar --input aerial.tif --output solar.tif
pygeovision ai segment water --input s2.tif --output water.tif
pygeovision ai segment agriculture --input s2.tif --output fields.geojson
pygeovision ai segment custom --input scene.tif --output pred.tif --model model.pth

# ─── AI: Detection ───────────────────────────────────────────────────────────
pygeovision ai detect cars --input aerial.tif --output cars.geojson
pygeovision ai detect ships --input port.tif --output ships.geojson
pygeovision ai detect grounded --input aerial.tif --prompt "swimming pools" --output pools.geojson
pygeovision ai detect rfdetr --input scene.tif --output det.geojson

# ─── AI: Classification ──────────────────────────────────────────────────────
pygeovision ai classify scene --input tile.tif --model classifier.pth
pygeovision ai classify land-cover --input s2.tif --classes "forest,water,urban,agriculture"

# ─── AI: Training ────────────────────────────────────────────────────────────
pygeovision ai train segmentation \
    --data ./building_chips/ \
    --output building_model.pth \
    --num-classes 2 \
    --epochs 100 \
    --backbone efficientnet-b4

pygeovision ai train land-cover \
    --data ./lc_chips/ \
    --output lc_model.pth \
    --num-classes 11 \
    --loss-fn unified_focal

pygeovision ai train detection \
    --data ./nwpu_chips/ \
    --output detector.pth \
    --num-classes 10

# ─── AI: Inference & Utilities ───────────────────────────────────────────────
pygeovision ai infer \
    --input large_scene.tif \
    --model model.pth \
    --output prediction.tif \
    --num-classes 5 \
    --tile-size 512 \
    --overlap 64

pygeovision ai chips \
    --image sentinel2.tif \
    --label labels.tif \
    --output ./chips/ \
    --chip-size 256

pygeovision ai cloud-mask --input sentinel2.tif --output cloud.tif

# ─── End-to-End Pipelines ────────────────────────────────────────────────────
pygeovision pipeline building_footprints \
    --bbox -0.15 51.47 -0.10 51.52 \
    --date 2024-06 \
    --output ./results/

pygeovision pipeline change_detection \
    --bbox -74.1 40.6 -73.7 40.9 \
    --date-before 2020-01 \
    --date-after 2024-01 \
    --output ./changes/

pygeovision pipeline list                   # Show all available pipelines

# ─── Models ──────────────────────────────────────────────────────────────────
pygeovision models list
pygeovision models list --task segmentation --pretrained-only
pygeovision models info unet_resnet50
pygeovision models cache
pygeovision models cache --clear
```

---

## Package Structure

```
pygeovision/                          5,845 lines across core files
│
├── __init__.py                 709   PyGeoVision — main client class
├── _version.py                       Version
├── api/__init__.py                   PyGeoVisionClient — web/notebook API
│
├── core/
│   ├── config.py                     PyGeoVisionConfig (Pydantic, YAML, env vars)
│   ├── engine.py                     Core engine wrapping SatelliteFetcher
│   └── exceptions.py                 Full exception hierarchy
│
├── data/                      2,275  PyGeoFetch integration layer
│   ├── fetch.py              1,636   SatelliteFetcher (CLI subprocess + pystac_client)
│   ├── providers.py            280   All 22 providers, STAC endpoints, shortcuts
│   └── pipeline.py             359   DataPipeline YAML builder
│
├── ai/                               AI layer
│   ├── engine.py                     AIEngine — lazy hub for own AI stack
│   ├── geoai/__init__.py      1,278  GeoAIEngine — 24 GeoAI subsystem proxies
│   │
│   ├── models/
│   │   ├── registry.py               14 registered model architectures
│   │   ├── hub.py                    ModelHub — download & cache weights
│   │   └── architectures/            UNet, SegFormer, FCOS, ViT, ...
│   │
│   ├── training/
│   │   ├── trainer.py                GeoTrainer — full training loop
│   │   ├── losses.py                 6 specialist loss functions
│   │   ├── metrics.py                IoU, F1, accuracy, confusion matrix
│   │   ├── callbacks.py              EarlyStopping, ModelCheckpoint, Progress
│   │   ├── distributed.py            Multi-GPU / DDP
│   │   ├── optimizers.py             AdamW, SGD, Lion, schedulers
│   │   └── export.py                 ONNX, TorchScript export
│   │
│   ├── inference/
│   │   ├── tiled_inference.py        TiledInference — Gaussian-blend tiling
│   │   ├── postprocessing.py         PostProcessor — smooth, regularize
│   │   ├── ensemble.py               EnsembleInference
│   │   ├── validation.py             InferenceValidator
│   │   └── vectorization.py          Raster → vector conversion
│   │
│   ├── labeling/
│   │   ├── osm_labeler.py            OpenStreetMap polygon labels
│   │   ├── microsoft_buildings.py    Microsoft global footprints
│   │   ├── google_buildings.py       Google Open Buildings
│   │   ├── esa_worldcover.py         ESA WorldCover land cover
│   │   ├── dynamic_world.py          Google Dynamic World
│   │   ├── sam_labeler.py            SAM auto-labeling
│   │   └── foundation_labeler.py     Foundation model labeling
│   │
│   ├── data/
│   │   ├── tiling.py                 GeoTIFF tiling and chip extraction
│   │   ├── dataset.py                TileMetadata, GeoDataset
│   │   ├── dataloader.py             GeoDataLoader (PyGeoFetch → training)
│   │   ├── augmentations.py          Geo-aware augmentations
│   │   ├── preprocessing.py          Normalisation, band selection
│   │   └── sampler.py                Weighted, stratified samplers
│   │
│   ├── pipelines/__init__.py   544   10 end-to-end geospatial pipelines
│   ├── monitoring/__init__.py        DriftDetector, PerformanceTracker
│   └── experiments/__init__.py       ExperimentTracker
│
├── cli/main.py                1,039  Complete CLI (data + ai + pipeline + models)
└── models/__init__.py                Pydantic schemas
```

---

## Configuration

```python
from pygeovision.core.config import PyGeoVisionConfig

# Load from YAML
config = PyGeoVisionConfig.load("pygeovision.yaml")

# Programmatic
config = PyGeoVisionConfig(
    gpu={"device": "cuda", "mixed_precision": True},
    training={"batch_size": 32, "learning_rate": 5e-5, "max_epochs": 200},
    model_hub={"cache_dir": "/data/models/"},
    pygeofetch={
        "default_providers": ["planetary_computer", "copernicus"],
        "cache_ttl_seconds": 7200,
        "download_parallel": 8,
        "verify_checksum": True,
    },
)

# Save
config.save("~/.pygeovision/config.yaml")

# Get PyGeoFetch config subset
config.as_pygeofetch_config()
```

**Environment variables:**
```bash
PYGEOVISION_GPU_DEVICE=cuda
PYGEOVISION_GPU_MIXED_PRECISION=true
PYGEOVISION_TRAINING_BATCH_SIZE=32
PYGEOVISION_TRAINING_LR=5e-5
PYGEOVISION_MODEL_HUB_CACHE_DIR=/data/models
PYGEOVISION_LOG_LEVEL=DEBUG
```

**Config search order (highest priority last):**
1. Built-in defaults
2. `~/.pygeovision/config.yaml`
3. `.pygeovision.yaml` (project-level)
4. Environment variables (`PYGEOVISION_*`)
5. Constructor arguments

---

## Cache & Status

```python
# System status
status = client.status()
# {
#   "pygeovision_version": "1.0.0",
#   "pygeofetch": {"available": True, "version": "1.0.0", "providers": 22, "open_providers": 10},
#   "geoai": {"available": True, "version": "0.39.2"},
#   "torch": {"version": "2.12.0", "cuda": True, "device": "cuda", "gpu": "NVIDIA A100"},
#   "rasterio": "1.5.0",
#   "geopandas": "1.1.3",
#   "registered_ai_models": 14,
# }

# Cache management (delegates to PyGeoFetch)
client.cache_stats()             # {"entries": 42, "size_mb": 8.7, "location": "..."}
client.clear_cache()             # clear all
client.clear_cache(provider="planetary_computer")
client.clear_cache(older_than="7d")
client.data.set_cache_ttl(7200)  # 2-hour TTL
client.data.prune_cache(max_size_gb=5.0)

# Diagnostics
client.doctor()                  # pygeofetch doctor
client.test_provider("copernicus")
```

---

## Testing

```
208 passing  19 skipped  0 failing

tests/
├── test_core.py              14  Config, exceptions, engine
├── test_data_layer.py        68  SatelliteFetcher, SearchResult, DataPipeline, providers
├── test_geoai_integration.py 68  All 24 GeoAI subsystems (mocked)
├── test_validation.py        21  TiledInference, vectorization
├── test_training.py              GeoTrainer, losses, metrics
├── test_labeling.py              7 labelers
├── test_inference.py             PostProcessor, Ensemble
├── test_integration.py           End-to-end pipeline integration
└── conftest.py                   Shared fixtures
```

```bash
# Run the full test suite (no GPU, no real network calls)
cd pygeovision_v2
pip install -e ".[dev]"
pytest tests/ -q                                    # all tests
pytest tests/test_data_layer.py -v                  # data layer only
pytest tests/test_geoai_integration.py -v           # GeoAI subsystems only
pytest tests/ --cov=pygeovision --cov-report=html   # with coverage
```

---

## Comparing PyGeoVision to Alternatives

| Feature | **PyGeoVision** | EODAG | TorchGeo | TerraTorch | Raw GeoAI |
|---------|:-----------:|:-----:|:--------:|:----------:|:---------:|
| Data providers | **22+** | 10+ | Limited | Limited | 3 |
| PyGeoFetch integration | ✅ Native | ❌ | ❌ | ❌ | ❌ |
| GeoAI integration | ✅ Full 24 subsystems | ❌ | ❌ | ❌ | ✅ Direct |
| CLI | ✅ Full | ✅ Partial | ❌ | ❌ | ❌ |
| YAML pipelines | ✅ | ❌ | ❌ | ❌ | ❌ |
| Auth / keyring | ✅ | ✅ | ❌ | ❌ | ❌ |
| Parallel downloads | ✅ | ✅ | ❌ | ❌ | ❌ |
| Post-processing chain | ✅ | ❌ | ❌ | ❌ | ❌ |
| End-to-end pipelines | **10** | ❌ | ❌ | ❌ | ❌ |
| SAM / GroundedSAM | ✅ | ❌ | ❌ | ❌ | ✅ |
| Foundation models | ✅ (Prithvi, DINOv3) | ❌ | ✅ | ✅ | ✅ |
| Automated labeling | **7 labelers** | ❌ | ❌ | ❌ | ❌ |
| ONNX export | ✅ | ❌ | ❌ | ❌ | ✅ |
| Commercial providers | ✅ Planet, Maxar, Airbus | ❌ | ❌ | ❌ | ❌ |

---

## Acknowledgements

PyGeoVision is built on top of two exceptional open-source projects:

- **[PyGeoFetch](https://github.com/appiahkubis14/PyGeoFetch)** — Universal satellite data pipeline by the PyGeoFetch team. PyGeoVision uses PyGeoFetch for all data search, download, authentication, caching, and pipeline orchestration.

- **[GeoAI](https://opengeoai.org)** — Artificial Intelligence for Geospatial Data by [Qiusheng Wu](https://github.com/giswqs) and contributors. PyGeoVision wraps GeoAI for all AI inference, training, and model management. Published in [JOSS 2026](https://doi.org/10.21105/joss.09605).

---

## License

**Apache 2.0** — see [LICENSE](LICENSE)

Copyright © 2026 PyGeoVision Contributors
