Metadata-Version: 2.4
Name: rictsc
Version: 0.1.1
Summary: Causal Time Series Modeling of Supraglacial Lake Evolution in Greenland under Distribution Shift
Author: Emam Hossain, Muhammad Hasan Ferdous, Devon Dunmire, Aneesh Subramanian, Md Osman Gani
Maintainer-email: Emam Hossain <emamh1@umbc.edu>
Project-URL: Homepage, https://github.com/ehfahad/RIC-TSC
Project-URL: Bug Tracker, https://github.com/ehfahad/RIC-TSC/issues
Project-URL: Source, https://github.com/ehfahad/RIC-TSC
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: numpy==1.26.4
Requires-Dist: pandas==2.2.3
Requires-Dist: scikit-learn==1.5.1
Requires-Dist: sktime==0.37.0
Requires-Dist: tigramite==5.2.7.0
Requires-Dist: xarray
Requires-Dist: geopandas
Requires-Dist: matplotlib
Requires-Dist: scipy
Requires-Dist: netcdf4

# RIC-TSC: Causal Time Series Modeling of Supraglacial Lake Evolution in Greenland under Distribution Shift

This repository provides the implementation for **"Causal Time Series Modeling of Supraglacial Lake Evolution in Greenland under Distribution Shift"**, an accepted paper to ICMLA 2025. We introduce a regionally-informed causal framework that discovers lagged environmental drivers of supraglacial lake (SGL) evolution across Greenland and uses these causal signals for robust sequence modeling under spatial distribution shift.

---
## Introduction

Supraglacial lakes (SGLs) exhibit complex spatiotemporal behaviors such as `rapid drainage`, `slow drainage`, `refreezing`, and `burial`. Accurate classification of lake evolution is critical to understanding meltwater runoff and ice sheet stability.

This repository presents a **causally-informed modeling framework** that identifies **invariant environmental drivers across Greenland** using Joint PCMCI+ (J-PCMCI+), and also captures **region-specific causal mechanisms** in individual basins. These **causal predictors** are then used in downstream **sequence modeling** to improve robustness and generalization under distribution shifts. We assess performance in global, in-distribution (ID), and out-of-distribution (OOD) settings.

---

## Methodology

We construct daily multivariate time series from satellite and reanalysis sources:
- **Sentinel-1 SAR** (HV backscatter anomaly)
- **Sentinel-2 and Landsat-8 optical imagery** (NDWI-based water fraction, solar zenith)
- **CARRA-West reanalysis** (temperature, humidity, pressure, SST, etc.)

J-PCMCI+ is applied globally and per region to identify lagged causal parents of HV_anom (horizontally transmitted, vertically received backscatter anomaly), a proxy for lake water presence. These causal features are then used for lake evolution classification.

<p align="center">
  <img src="figures/methodology.png" alt="RIC-TSC Methodology" width="800"/>
</p>

---

## Installation

Install the package in editable mode for development:

```bash
git clone [https://github.com/ehfahad/RIC-TSC.git](https://github.com/ehfahad/RIC-TSC.git)
cd RIC-TSC
pip install -e .
```
---

## Directory Structure

```bash
RIC-TSC/
├── src/rictsc/                        # Core package logic
│   ├── utils/                         # Refactored helper functions
│   ├── preprocessing.py               # Preprocessing module
│   ├── causality.py                   # Causal feature module
│   └── classification.py              # RICTSCClassifier API
├── causality/                         # J-PCMCI+ causal discovery notebooks
├── data/                              # Raw, processed, and causal datasets
├── figures/                           # Methodology diagrams and experiment visualizations
├── results/                           # Output metrics, confusion matrices, GMM plots
├── tests/                             # Package sanity tests
├── pyproject.toml                     # Package metadata and dependencies
├── run_global_classification.py       # Global pooled classification script
└── run_regionwise_classification.py   # Region-wise ID and OOD classification script
```

---

## Quickstart

### 1. Command Line Interface
Run the pipeline directly from the terminal using the installed entry points:

```bash
# Step 1: Preprocess time series for all lakes
rictsc-preprocess

# Step 2: Extract region-specific causal datasets
rictsc-causal
```

### 2. Python API
Integrate the RIC-TSC classifier into your own scripts:

```bash
from rictsc import RICTSCClassifier
import pandas as pd

# Initialize the classifier
model = RICTSCClassifier(seed=42)

# Load data and fit model on causal features
df = pd.read_csv("data/region_causal_datasets/CW_causal_timeseries.csv")
model.fit(df, feature_cols=["HV_anom_lag1", "S2_water", "r2"], label_col="label")

# Predict on new sequences
predictions = model.predict(test_df, feature_cols=["HV_anom_lag1", "S2_water", "r2"])
```
---

## Output Structure

```bash
results/
├── global_classification/
│   └── global_classification_results.csv  # Metrics for global experiment comparing causal vs. baseline models
│
├── region_specific_classification/
│   ├── id_results.csv                     # Region-wise ID results comparing causal vs. baseline models
│   └── ood_results.csv                    # OOD results where models are trained on one region and tested on the other five

```

---

## Experiments

We evaluate RIC-TSC under three experimental settings:

- **Global**: Train/test on pooled lake data from all six regions using an 80/20 split stratified by region.  
- **In-Distribution (ID)**: For each region, an 80/20 train/test split is applied to that region’s lakes.  
- **Out-of-Distribution (OOD)**: Train on a single region and test on the remaining five, assessing generalization beyond the training domain.

Each setting compares two models:

- **Causal Model**: Trained only on the lagged causal parents discovered by J-PCMCI+ for each region.  
- **Baseline Model**: Trained using all available features, with no causal feature selection or temporal lag filtering.

Performance is reported using overall accuracy, macro-averaged F1, precision, and recall.

---

## Citation

This work is under submission. Please cite as:

```bash
@misc{hossain2025rictsc,
  title={Causal Time Series Modeling of Supraglacial Lake Evolution in Greenland under Distribution Shift},
  author={Emam Hossain and Muhammad Hasan Ferdous and Devon Dunmire and Aneesh Subramanian and Md Osman Gani},
  year={2025},
  note={Accepted for publication in 2025 International Conference on Machine Learning and Applications (ICMLA)}
}
```

---
