Metadata-Version: 2.4
Name: rsd
Version: 1.0.0
Summary: Reference-based standardization framework for hydroclimate drought indices
Project-URL: Homepage, https://github.com/thchilly/rsd
Project-URL: Repository, https://github.com/thchilly/rsd
Project-URL: Documentation, https://hydro-rsd.readthedocs.io
Project-URL: Issues, https://github.com/thchilly/rsd/issues
Project-URL: Changelog, https://github.com/thchilly/rsd/blob/main/CHANGELOG.md
Author-email: Athanasios Tsilimigkras <atsilimigkras1@tuc.gr>
License-Expression: BSD-3-Clause
License-File: LICENSE
Keywords: SPEI,SPI,SSI,drought,extreme-value,hydroclimate,hydrology,standardized-index
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Atmospheric Science
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Provides-Extra: all
Requires-Dist: dask[array]>=2023.1; extra == 'all'
Requires-Dist: matplotlib>=3.7; extra == 'all'
Requires-Dist: xarray>=2023.1; extra == 'all'
Provides-Extra: dev
Requires-Dist: dask[array]>=2023.1; extra == 'dev'
Requires-Dist: matplotlib>=3.7; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: xarray>=2023.1; extra == 'dev'
Provides-Extra: diagnostics
Requires-Dist: matplotlib>=3.7; extra == 'diagnostics'
Provides-Extra: xarray
Requires-Dist: dask[array]>=2023.1; extra == 'xarray'
Requires-Dist: xarray>=2023.1; extra == 'xarray'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/thchilly/rsd/main/assets/rsd_logo_banner.png" alt="rsd - Reference-based standardization framework for drought indices under distribution shift" width="100%">
</p>

# rsd

*Standardized drought indices (SPI, SSI, SDI, SPEI) that are **comparable** across model runs, scenarios, and reanalysis products.*

Full documentation: <https://hydro-rsd.readthedocs.io>

`rsd` computes standardized hydroclimate indices by fitting the CDF from a
fixed reference dataset rather than from the target itself, so that values
from different model runs, scenarios, or observational products can be
compared on the same scale.

## Why RSD?

Standard implementations of SPI/SSI standardize each series against itself,
which removes the cross-series differences you want to measure. RSD solves
three interdependent problems:

1. **Fixed reference** - fit the CDF once from a reference period or
   dataset; evaluate target values against it.
2. **GPD tail extension** - empirical CDFs cap at the observed range. RSD
   fits Generalized Pareto Distribution tails for smooth extrapolation
   beyond the reference sample.
3. **Pooled deseasonalization** - per-month samples are too sparse to
   fit EVT tails. A 50-year record gives ~50 values per calendar month;
   the top/bottom 10% (the tail) is only ~5 exceedances per month - too
   few for a stable GPD fit. RSD removes the per-month location and
   scale, then pools all 12 months into one sample (~600 values, ~60
   exceedances per tail) where the fit becomes feasible.

This pooling is what keeps RSD usable on short records. With 20 years
of monthly data (typical for satellite-era datasets) the pooled tail
still has ~24 exceedances - enough to fit a single GPD - while a
per-month tail fit would have only ~2 exceedances per month per tail
and is infeasible.

Monthwise ECDF and fully parametric (e.g. gamma) methods are also included
as baselines.

## Requirements

- Python ≥ 3.10
- NumPy ≥ 1.24
- SciPy ≥ 1.10

Optional extras:

- `[xarray]` - xarray ≥ 2023.1, dask ≥ 2023.1 (N-D + parallel computation)
- `[diagnostics]` - matplotlib ≥ 3.7 (`rsd.diagnose` plots)

## Installation

```bash
pip install rsd                  # NumPy/SciPy only
pip install rsd[xarray]          # adds xarray + dask support
pip install rsd[diagnostics]     # adds matplotlib for rsd.diagnose
pip install rsd[all]             # everything above
```

## Quick start

In RSD vocabulary, the **reference** defines what "normal" looks like (e.g.
observed climate over a baseline period) and the **target** is the series
you want to score against that normal (e.g. a future projection or a
different scenario). Output `z` is in standard-normal units: `z ≈ 0` is
climatology and `|z| > 2` is extreme.

```python
import numpy as np
import rsd

# 1-D: standardize a 1200-month target against a 600-month reference
rng = np.random.default_rng(0)
months_ref = np.tile(np.arange(1, 13), 50)    # 600 months
months_tgt = np.tile(np.arange(1, 13), 100)   # 1200 months
ref = rng.gamma(shape=2, scale=5, size=600)
tgt = rng.gamma(shape=2, scale=5, size=1200)

z = rsd.standardize(
    target=tgt,
    reference=ref,
    months_target=months_tgt,
    months_reference=months_ref,
    scale=3,                                   # 3-month accumulation (e.g. SPI-3)
)
```

```python
# N-D: xarray wrapper (dask-parallelized for large grids).
# Months are extracted automatically from the time coordinate, so you do
# not pass months_target / months_reference here.
import rsd

z = rsd.standardize_xr(
    target=target_da,        # xr.DataArray with a "time" dimension
    reference=ref_da,        # xr.DataArray with a "time" dimension
    method="rsd",            # or "monthwise_ecdf" / "monthwise_parametric"
    scale=3,
    parallel=True,
)
```

### Diagnostics

`rsd.diagnose(values, months, name, ...)` is the one-call entry point
that verifies the exchangeability assumption underlying RSD pooling. It
prints an overview block (configuration plus extracted seasonal location
and scale), renders a combined summary figure (before / after
deseasonalization KDEs), and prints an Anderson-Darling omnibus plus
per-month Kolmogorov-Smirnov leave-one-out report. Pass `bounds=(L, U)`
to add a logit-bounded pathway alongside the baseline; add
`auto_bounds=True` to also see a heuristic data-driven bound via
`rsd.estimate_bounds`. Use `quiet=True` for batch / CI runs. See the
`diagnostics_showcase.ipynb` notebook for a worked example.

## Methods

| method | description |
|--------|-------------|
| `"rsd"` | Deseasonalize -> pool -> ECDF core + GPD tails |
| `"monthwise_ecdf"` | Per-month empirical CDF (classical SPI-style) |
| `"monthwise_parametric"` | Per-month parametric fit (gamma, norm, …) |

The `monthwise_ecdf` baseline matches the SDAT framework of
[Farahmand & AghaKouchak (2015)](https://doi.org/10.1016/j.advwatres.2014.11.012).
The `monthwise_parametric` path defaults to `floc=0` (the canonical SPI
convention of [Stagge et al. (2015)](https://doi.org/10.1002/joc.4267));
pass `floc=None` to recover scipy's free-location 3-parameter fit.

## Contributing & issues

Bug reports and questions are welcome at
<https://github.com/thchilly/rsd/issues>. Contributions follow the workflow
in [CONTRIBUTING.md](https://github.com/thchilly/rsd/blob/main/CONTRIBUTING.md).

## How to cite

If you use this package in your research, please cite the methodology paper:

> Tsilimigkras, A., Grillakis, M., & Koutroulis, A. (2026). *A
> reference-based standardization framework for hydroclimate drought
> indices under distribution shift*. Manuscript submitted to *Water
> Resources Research*. DOI pending acceptance.

For reproducibility, you may additionally cite the specific software version:

> Tsilimigkras, A. (2026). *rsd*: Reference-based standardization
> framework for hydroclimate drought indices (Version 1.0.0)
> [Computer software]. Zenodo DOI: pending.

## License

BSD 3-Clause
