Metadata-Version: 2.4
Name: nwpio
Version: 0.1.0
Summary: Download and process NWP forecast data from cloud archives
Author-email: Oceanum Developers <developers@oceanum.science>
Maintainer-email: Oceanum Developers <developers@oceanum.science>
License: MIT
Keywords: weather,forecast,gfs,ecmwf,grib,zarr,nwp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: google-cloud-storage
Requires-Dist: xarray
Requires-Dist: cfgrib
Requires-Dist: zarr<3
Requires-Dist: dask
Requires-Dist: click
Requires-Dist: pydantic
Requires-Dist: pydantic-settings
Requires-Dist: python-dateutil
Requires-Dist: tqdm
Requires-Dist: fsspec
Requires-Dist: gcsfs
Requires-Dist: s3fs
Requires-Dist: pyyaml
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs; extra == "docs"
Requires-Dist: mkdocs-material; extra == "docs"
Requires-Dist: mkdocstrings[python]; extra == "docs"
Dynamic: license-file

# NWPIO

A Python library for downloading and processing Numerical Weather Prediction (NWP) forecast data from cloud archives.

## Features

### Data Download
- **Multiple NWP models** - GFS, ECMWF HRES/ENS support
- **Flexible resolutions** - 0.1°, 0.25°, 0.5°, 1.0° depending on product
- **Configurable cycles** - 00z, 06z, 12z, 18z with variable lead times (up to 384h)
- **Parallel downloads** - Configurable workers for fast transfers
- **GCS-to-GCS copying** - No local storage needed for large files
- **File validation** - Ensures all files are complete before downloading
- **Smart skipping** - Avoid re-downloading existing files

### GRIB Processing
- **Variable extraction** - Select specific variables from GRIB files
- **Time concatenation** - Combine multiple files along time dimension
- **Zarr conversion** - Efficient chunked storage format
- **Configurable chunking** - Optimize for your access patterns
- **Compression support** - Multiple algorithms (zstd, lz4)
- **GRIB key filtering** - Filter by level, type, etc.
- **Parallel GRIB loading** - Fast processing with multiple workers

### Production Ready
- **Type-safe configuration** - Pydantic models with validation
- **Flexible cycle configuration** - CLI (`--cycle`), environment (`$CYCLE`), or config file
- **Multi-process workflow** - Download once, process multiple variable sets
- **Cycle-based formatting** - Dynamic paths with `{cycle:%Y%m%d}` placeholders
- **Comprehensive logging** - Track progress and debug issues
- **Error handling** - Robust recovery and retry logic
- **Automatic cleanup** - Optional GRIB file deletion after processing
- **Docker support** - Container-ready for cloud deployment

## Installation

```bash
pip install -e .
```

For development:
```bash
pip install -e ".[dev]"
```

## Quick Start

### Download GRIB files

```python
from nwpio import GribDownloader, DownloadConfig
from datetime import datetime

config = DownloadConfig(
    product="gfs",
    resolution="0p25",
    forecast_time=datetime(2024, 1, 1, 0),
    cycle="00z",
    max_lead_time=120,  # hours
    source_bucket="gcp-public-data-arco-era5",
    destination_bucket="your-bucket-name",
)

downloader = GribDownloader(config)
downloaded_files = downloader.download()
```

### Process GRIB to Zarr

```python
from nwpio import GribProcessor, ProcessConfig

config = ProcessConfig(
    grib_files=downloaded_files,
    variables=["t2m", "u10", "v10", "tp"],
    output_path="gs://your-bucket/output.zarr",
)

processor = GribProcessor(config)
processor.process()
```

### Using the CLI

```bash
# Download GRIB files
nwpio download \
    --product gfs \
    --resolution 0p25 \
    --time 2024-01-01T00:00:00 \
    --cycle 00z \
    --max-lead-time 120 \
    --source-bucket gcp-public-data-arco-era5 \
    --dest-bucket your-bucket-name

# Process GRIB to Zarr
nwpio process \
    --grib-path gs://your-bucket/grib/ \
    --variables t2m,u10,v10,tp \
    --output gs://your-bucket/output.zarr

# Combined workflow
nwpio run \
    --config config.yaml
```

### Configuration File Example

#### Single Process Configuration
```yaml
# config.yaml
download:
  product: gfs
  resolution: 0p25
  cycle: "2024-01-01T00:00:00"
  max_lead_time: 6
  source_bucket: global-forecast-system
  destination_bucket: your-bucket-name
  destination_prefix: nwp-data/

process:
  - filter_by_keys:
      typeOfLevel: heightAboveGround
      level: 10
    zarr_path: gs://your-bucket/wind_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
    variables: [u10, v10]
    write_local_first: true
    max_upload_workers: 16
```

#### Multi-Process Configuration (Recommended)
Download once, create multiple Zarr archives with different variable sets:

```yaml
# config-multi.yaml
cleanup_grib: true  # Delete GRIB files after all processing

download:
  product: gfs
  resolution: 0p25
  cycle: "2024-01-01T00:00:00"
  max_lead_time: 6
  source_bucket: global-forecast-system
  destination_bucket: your-bucket-name

process:
  # Process 1: 10m winds
  - filter_by_keys:
      typeOfLevel: heightAboveGround
      level: 10
    zarr_path: gs://your-bucket/wind10m_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
    variables: [u10, v10]
    max_upload_workers: 16
    
  # Process 2: 2m temperature and humidity
  - filter_by_keys:
      typeOfLevel: heightAboveGround
      level: 2
    zarr_path: gs://your-bucket/surface_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
    variables: [t2m, d2m]
    max_upload_workers: 16
```

Run with:
```bash
nwpio run --config config-multi.yaml --max-workers 8
```

#### ECMWF Source Selection
ECMWF data is available from two sources. Simply specify `source_type` to choose:

```yaml
# Use GCS (Google Cloud Storage) - Official ECMWF bucket (default)
download:
  product: ecmwf-hres
  resolution: 0p25
  source_type: gcs  # Uses ecmwf-open-data bucket (default)
  max_lead_time: 120

# Use AWS S3 - Alternative source
download:
  product: ecmwf-hres
  resolution: 0p25
  source_type: aws  # Uses ecmwf-forecasts bucket
  max_lead_time: 120
```

The `source_type` defaults to `gcs`. The appropriate bucket is automatically selected based on the product and source type. You can override with a custom `source_bucket` if needed.

## Supported Products

### GFS (Global Forecast System)
- Resolutions: 0p25 (0.25°), 0p50 (0.5°), 1p00 (1.0°)
- Cycles: 00z, 06z, 12z, 18z
- Lead times: Up to 384 hours

### ECMWF
- Products: HRES (High Resolution), ENS (Ensemble)
- Resolutions: 0p1 (0.1°), 0p25 (0.25°)
- Cycles: 00z, 12z
- Lead times: Up to 240 hours (HRES), 360 hours (ENS)
- Sources:
  - **GCS**: `gs://ecmwf-open-data` (official ECMWF bucket)
  - **AWS**: `s3://ecmwf-forecasts` (alternative source)

## Architecture

```
nwpio/
├── __init__.py
├── config.py          # Configuration models using Pydantic
├── sources.py         # Data source definitions for GFS/ECMWF
├── downloader.py      # GRIB file download logic
├── processor.py       # GRIB to Zarr conversion
├── utils.py           # Utility functions
└── cli.py             # Command-line interface
```

## Requirements

- Python 3.9+
- Google Cloud Storage access (with appropriate credentials)
- GRIB file support (eccodes library)

## License

MIT
