Metadata-Version: 2.4
Name: community-forcing-service
Version: 0.1.0
Summary: Community Forcing Service — acquire-and-subset access to meteorological forcing for hydrological models
Project-URL: Homepage, https://github.com/DarriEy/CFS
Project-URL: Documentation, https://darriey.github.io/CFS/
Project-URL: Repository, https://github.com/DarriEy/CFS
Project-URL: Issues, https://github.com/DarriEy/CFS/issues
Project-URL: Changelog, https://github.com/DarriEy/CFS/blob/main/CHANGELOG.md
Author-email: Darri Eythorsson <dae5@hi.is>
License-Expression: MIT
License-File: LICENSE
Keywords: era5,forcing,hydrology,meteorology,reanalysis,xarray,zarr
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Atmospheric Science
Classifier: Topic :: Scientific/Engineering :: Hydrology
Requires-Python: >=3.11
Requires-Dist: click>=8.0
Requires-Dist: numpy>=1.26
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.31
Requires-Dist: structlog>=24.0
Requires-Dist: tenacity>=8.0
Provides-Extra: cds
Requires-Dist: cdsapi>=0.7.2; extra == 'cds'
Provides-Extra: climate
Requires-Dist: aiohttp>=3.9; extra == 'climate'
Requires-Dist: dask>=2024.0; extra == 'climate'
Requires-Dist: fsspec>=2024.0; extra == 'climate'
Requires-Dist: gcsfs>=2024.0; extra == 'climate'
Requires-Dist: h5netcdf>=1.3; extra == 'climate'
Requires-Dist: h5py>=3.10; extra == 'climate'
Requires-Dist: netcdf4>=1.6; extra == 'climate'
Requires-Dist: s3fs>=2024.0; extra == 'climate'
Requires-Dist: xarray>=2024.0; extra == 'climate'
Requires-Dist: zarr>=2.18; extra == 'climate'
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pandas-stubs>=2.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0; extra == 'dev'
Requires-Dist: types-requests>=2.31; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs>=1.6; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'docs'
Provides-Extra: earthdata
Requires-Dist: pandas>=2.0; extra == 'earthdata'
Requires-Dist: pydap>=3.4; extra == 'earthdata'
Requires-Dist: pyproj>=3.6; extra == 'earthdata'
Provides-Extra: forecast
Requires-Dist: cfgrib>=0.9.10; extra == 'forecast'
Requires-Dist: eccodes>=1.6; extra == 'forecast'
Requires-Dist: pandas>=2.0; extra == 'forecast'
Description-Content-Type: text/markdown

# CFS — Community Forcing Service

Acquire-and-subset access to meteorological **forcing** products for hydrological
modelling.

Acquiring forcing for a modeling study traditionally means bespoke scripting
per product — every product has its own API, native variable names, units,
accumulation conventions, and grid, and the accumulation-to-rate conversion is
re-implemented (and mis-implemented) in every group's scripts. CFS replaces
that with **one async interface over 33 products** that stops at a canonical,
CF-aligned `xarray.Dataset` — deliberately leaving catchment/HRU remapping and
model-specific file formats to modeling frameworks (e.g. SYMFLUENCE).

**Documentation: <https://darriey.github.io/CFS/>**

CFS is the third member of the community-data triad alongside
**CAS** (Community Attribute Service) and **CSFS** (Community Streamflow Service):

| Service | Data | Returns |
|---------|------|---------|
| CAS  | geospatial attributes (DEM, soil, land cover) | harmonized zonal statistics |
| CSFS | streamflow observations | harmonized station time series |
| **CFS** | **meteorological forcing** | **canonical, subset `xarray.Dataset`** |

## The boundary (why CFS stops where it does)

CFS does exactly one job: **acquire a forcing product, subset it to a bounding
box + time range, harmonize it to a canonical schema, and hand back a lazy
`xarray.Dataset`.** That's it.

It deliberately does **not**:

- remap to HRUs / sub-basins,
- write model-specific forcing schemas (SUMMA, FUSE, mizuRoute, …),
- serialize monthly NetCDF chunks or handle HPC filesystem locking.

Those steps are model- and deployment-specific, so they stay in the consumer
(e.g. SYMFLUENCE). Keeping the boundary here is what makes CFS reusable across
frameworks rather than a SYMFLUENCE library in disguise.

```
 upstream store ──▶  subset to bbox+time  ──▶  harmonize to canonical  ──▶  xr.Dataset
   (Zarr/S3/…)        cfs.subset.bbox            cfs.subset.canonical          │
                                                                               ▼
                                              [ consumer: HRU remap + model schema ]
```

## Canonical schema (`canonical-v1`)

Every connector renames native variables to CF-aligned canonical names and
converts to canonical SI units (see `cfs/core/vocabulary.py`). Precipitation and
radiation are always returned as **rates** (`kg m-2 s-1`, `W m-2`), never
accumulations — the conversion that most often goes wrong is done once, here.
The output contract (names, units, attrs, grid layouts, time conventions) is
specified normatively in
[the canonical-v1 spec](https://darriey.github.io/CFS/canonical-v1/).

## Install

```bash
pip install 'community-forcing-service[climate]'   # xarray, zarr, gcsfs, dask, netcdf4
```

The distribution is named `community-forcing-service` (the name `cfs` is taken
on PyPI), but the import package and CLI are still `cfs` (`import cfs`). From a
checkout:

```bash
pip install -e '.[climate]'
```

## Use

```bash
cfs providers                    # list registered providers
cfs products                     # list products + canonical variables
cfs fetch \
  -P era5_arco:single_levels \
  -b -114.5,50.7,-114.0,51.1 \
  --start 2015-06-01T00:00 --end 2015-06-01T06:00 \
  -v air_temperature,precipitation_flux
```

Python:

```python
from cfs.core.models import BoundingBox, TimeRange
from cfs.core.registry import discover, get_connector
from cfs.core.vocabulary import CanonicalVar

discover()
Conn = get_connector("era5_arco")
async with Conn() as conn:
    ds, result = await conn.fetch(
        "era5_arco:single_levels",
        BoundingBox(min_lon=-114.5, min_lat=50.7, max_lon=-114.0, max_lat=51.1),
        TimeRange(start=..., end=...),
        variables=[CanonicalVar.AIR_TEMPERATURE, CanonicalVar.PRECIPITATION_FLUX],
    )
# ds: lazy canonical cube;  result: FetchResult provenance/shape metadata
```

## Adding a connector

Subclass `BaseForcingConnector` (optionally mix in `ZarrStoreMixin`), implement
`list_products()` and `fetch()`, declare a `VariableMapping` table mapping native
names → canonical vars + linear unit conversions, and decorate with
`@register("slug")`. `discover()` finds it automatically.

## Providers

33 connectors — 31 live-verified against their upstream stores (19 anonymous +
12 auth-gated, confirmed with real CDS and Earthdata credentials); `mswep` and
`em_earth` are offline-verified pending access/credentials. Highlights:

| | products |
|---|---|
| **Global / regional reanalyses** | ERA5 (ARCO + CDS), ERA5-Land, MERRA-2, CARRA, CERRA, RDRS/CaSR, BARRA-R2, CONUS404, NARR, WFDE5 |
| **Analysis / observation grids** | AORC (+ NWM grid), NLDAS-2, HRRR, NWM operational, Daymet, gridMET, nClimGrid-Daily, GLDAS, FLDAS, E-OBS |
| **Satellite / merged precipitation** | CHIRPS, CHIRTS, GPM IMERG, PERSIANN-CDR, CMORPH, MSWEP, EM-Earth |
| **Forecasts** | GFS (deterministic), GEFS (ensemble, `member` dimension) |
| **Climate projections** | NEX-GDDP-CMIP6, NA-CORDEX |

The full per-provider table — grid type, access protocol, auth, verification
status, and the per-provider caveats (rolling archive windows, unverified
units, slow OPeNDAP paths, derivation notes) — lives in the
[provider catalog](https://darriey.github.io/CFS/catalog/), with the
machine-readable version in
[`inventory/providers.yaml`](inventory/providers.yaml).

CDS connectors need `~/.cdsapirc`; Earthdata connectors need
`EARTHDATA_TOKEN` (or `~/.netrc` / `EARTHDATA_USERNAME`+`PASSWORD`) with the
"NASA GESDISC DATA ARCHIVE" app authorized. GFS/GEFS need the `forecast`
extra:

```bash
pip install 'community-forcing-service[climate,cds,earthdata,forecast]'
```

Note that CFS is a passthrough service — every fetch hits the provider's live
store, so transient upstream outages (THREDDS restarts, S3 hiccups, CDS queue
congestion) can surface as fetch errors independent of CFS itself.

## Hardening / robustness

- **Range QC** (`cfs/qc.py`): every fetch samples the harmonized cube against
  each canonical variable's physical `valid_range` and reports out-of-range
  values in `FetchResult.warnings` — catching unit-conversion bugs (a precip
  flux of 8.6 instead of 1e-4) before they reach a model. Advisory; never fails
  a fetch. Toggle with `CFS_QC_ENABLED`.
- **Fetch guardrails**: shared `_guard_area` (`CFS_MAX_AREA_DEG2`) and
  cell-count (`CFS_MAX_CELLS_PER_FETCH`) checks on the base class refuse
  accidental continental/decadal pulls; enforced uniformly via `_finalize`.
- **Reset-aware de-accumulation** (`cfs/subset/deaccumulate.py`): running-total
  fields (ERA5-Land `tp`/`ssrd`/`strd`) are converted to per-step increments
  before unit conversion, handling daily resets.

## Derived variables

When a provider lacks a canonical field, CFS derives it once, in a tested place
(`cfs/derive/`). Currently: **specific humidity from relative humidity**
(`cfs/derive/humidity.py`, Bolton 1980 saturation vapour pressure) — used by
CARRA/CERRA, which ship 2 m RH rather than specific humidity. Derivation inputs
(RH) are consumed, not emitted: they do not appear in the canonical output.

## Tests

```bash
pytest -m 'not network'    # offline: harmonization + subsetting logic
pytest -m network          # integration: real ERA5 fetch from GCS
```

## Naming note

"CFS" also denotes NOAA's **Climate Forecast System** (CFSR/CFSv2), itself a
forcing product. If a CFSR connector is ever added it must use a disambiguated
slug (e.g. `cfsr`) to avoid collision with the service name.
