Metadata-Version: 2.4
Name: community-streamflow-service
Version: 0.2.0
Summary: Community Streamflow Service — live acquisition and harmonization of global streamflow observations
Project-URL: Homepage, https://github.com/DarriEy/CSFS
Project-URL: Repository, https://github.com/DarriEy/CSFS
Project-URL: Documentation, https://darriey.github.io/CSFS/
Project-URL: Issues, https://github.com/DarriEy/CSFS/issues
Project-URL: Changelog, https://github.com/DarriEy/CSFS/blob/main/CHANGELOG.md
Author-email: Darri Eythorsson <dae5@hi.is>
License-Expression: GPL-3.0-or-later
License-File: LICENSE
Keywords: hydrology,open-data,streamflow,time-series,water
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Hydrology
Requires-Python: >=3.11
Requires-Dist: cdsapi>=0.7.2
Requires-Dist: cfgrib>=0.9.10
Requires-Dist: click>=8.0
Requires-Dist: croniter>=2.0
Requires-Dist: duckdb>=1.0
Requires-Dist: httpx>=0.27
Requires-Dist: pyarrow>=15.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyproj>=3.6
Requires-Dist: pytz>=2024.1
Requires-Dist: pyyaml>=6.0
Requires-Dist: structlog>=24.0
Requires-Dist: tenacity>=8.0
Requires-Dist: xarray>=2024.1
Provides-Extra: api
Requires-Dist: fastapi>=0.111; extra == 'api'
Requires-Dist: uvicorn[standard]>=0.30; extra == 'api'
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pre-commit>=3.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: types-croniter; extra == 'dev'
Requires-Dist: types-pyyaml; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs>=1.6; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'docs'
Provides-Extra: pandas
Requires-Dist: pandas>=2.0; extra == 'pandas'
Description-Content-Type: text/markdown

# CSFS — Community Streamflow Service

**Live acquisition and harmonization of global streamflow observations.**

[![CI](https://github.com/DarriEy/CSFS/actions/workflows/ci.yml/badge.svg)](https://github.com/DarriEy/CSFS/actions/workflows/ci.yml)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

CSFS connects to open streamflow data providers worldwide — national
hydrological agencies, regional networks, research archives, and global model
products — harmonizes their observations into one canonical
station/observation schema (discharge in m³/s, timestamps in UTC), and
maintains a near-real-time DuckDB store with scheduled acquisition, health
monitoring, a CLI, and a FastAPI read layer.

**Documentation:** <https://darriey.github.io/CSFS/>

## Why CSFS?

Programmatic access to river discharge data is fragmented: the community
relies either on *static archives* (GRDC, Caravan, GSIM, EStreams, CAMELS)
that are frozen at publication time, or on *single-agency clients* (USGS
`dataretrieval`, `hydrofunctions`) that each cover one network. Getting
current discharge across, say, France, Brazil, and Japan means learning three
APIs, three formats, and three unit conventions. CSFS provides a single
interface for live, multi-provider acquisition — one connector per agency,
every observation normalized to a common schema, re-acquisition scheduled to
each provider's update cadence — and keeps its provider roster honest
mechanically, with CI-enforced integrity tests.

## Provider roster (the honest numbers)

- **104 sources cataloged** in [`inventory/providers.yaml`](inventory/providers.yaml),
  labeled by readiness: **78 implemented**, 17 research, 5 fallback,
  3 manual, 1 deprecated.
- **86 connectors registered in code** — the 78 `implemented` entries plus 8
  still labeled `research` while their upstream data paths are validated.
- **41 implemented providers are realtime/near-realtime**; the rest are
  recent/archive sources, including roughly a dozen offline research archives
  (GRDC, Caravan, GSIM, EStreams, LamaH, CAMELS variants, ROBIN, ADHI, SIEREM).

These statuses are **CI-enforced**: `tests/test_connector_integrity.py`
fails the build if a connector ships without tests, lacks a scheduler tier,
or if the inventory claims `implemented` for a connector that does not exist.
See the full [provider catalog](https://darriey.github.io/CSFS/catalog/).

> Note: live-provider commands talk to real agency APIs and can hit transient
> upstream outages — a failed fetch is usually them, not you.

## Install

```bash
pip install community-streamflow-service            # core
pip install "community-streamflow-service[pandas]"  # + DataFrame store queries
pip install "community-streamflow-service[api]"     # + FastAPI read layer
```

Requires Python 3.11+.

## Quick start (CLI)

```bash
csfs providers                          # list registered providers + tiers
csfs fetch -p usgs --lookback 168 -n 50 # fetch a week of USGS data
csfs status                             # what's in the local DuckDB
csfs health                             # per-connector freshness + run health
csfs serve                              # HTTP read layer (needs the api extra)
```

## Quick start (Python)

```python
import asyncio

import csfs


async def main() -> None:
    async with csfs.open_store("csfs.duckdb", read_only=False) as store:
        await csfs.run_acquisition(store, providers=["usgs"], lookback_hours=48, max_stations=20)

        stations = await store.get_stations(provider="usgs", limit=5)
        # pandas DataFrame indexed by timestamp (needs the [pandas] extra);
        # get_observations() / get_observations_arrow() need no extra.
        df = await store.get_observations_df(stations[0].id)
        print(df["discharge_m3s"].describe())


asyncio.run(main())
```

Or pull one gauge's series straight from a provider, no database involved:

```python
from datetime import UTC, datetime, timedelta

import csfs

end = datetime.now(UTC)
chunk = csfs.fetch_observations_sync("usgs", "usgs:01646500", start=end - timedelta(days=7), end=end)
```

The store is a plain DuckDB file — any SQL/pandas/Arrow tooling works on it
directly. The blessed, stable surface is what `import csfs` re-exports; see
the [Python API guide](https://darriey.github.io/CSFS/python-api/).

## API keys

Most connectors need no credentials. Exceptions: **`norway_nve`** (free
[NVE HydAPI](https://hydapi.nve.no/) key) and **`glofas`**
([Copernicus CDS](https://cds.climate.copernicus.eu/) token in `~/.cdsapirc`).
Keep keys out of tracked config files.

## Architecture

```
connectors/     Provider plugins (one per data source)
core/           Canonical data models, registry, health, exceptions
store/          Persistence layer (DuckDB default)
scheduler/      Acquisition runner, cron tiers, daemon
api/            FastAPI query layer
cli/            Command-line interface
inventory/      Global provider inventory (YAML)
```

Details — including the roster-integrity guard system and the hermetic test
policy — in the [architecture docs](https://darriey.github.io/CSFS/architecture/).

## Contributing

The most valuable contribution is a new provider connector. See
[CONTRIBUTING.md](CONTRIBUTING.md) for the walkthrough and the
roster-integrity requirements your PR must satisfy.

## Citing

See [CITATION.cff](CITATION.cff).

## License

GPL-3.0-or-later. See [LICENSE](LICENSE).
