Metadata-Version: 2.4
Name: tsrbench
Version: 0.1.0
Summary: TSRBench: EVT-based noise injection toolkit for evaluating time series robustness
Author: Dongbin Kim
License: MIT License
        
        Copyright (c) 2026 Dongbin Kim
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/dongbeank/TSRBench
Project-URL: Repository, https://github.com/dongbeank/TSRBench
Keywords: time-series,robustness,benchmark,noise-injection,extreme-value-theory
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scikit-learn
Requires-Dist: matplotlib
Requires-Dist: ads-evt>=0.0.4
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Dynamic: license-file

# TSRBench

**EVT-based noise injection toolkit for evaluating time series forecasting robustness**

![Python 3.8+](https://img.shields.io/badge/python-3.8%2B-blue)
![License: MIT](https://img.shields.io/badge/license-MIT-green)

TSRBench generates realistic corrupted time series datasets by injecting **level shifts** and **exponential spikes** whose amplitudes are calibrated using **Extreme Value Theory (EVT)**. This produces noise that respects the statistical properties of each series, providing a principled benchmark for evaluating how robust time series forecasting models are to data corruption.

<p align="center">
  <img src="assets/original.png" width="24%" />
  <img src="assets/original_level_shift.png" width="24%" />
  <img src="assets/original_exponential_spike.png" width="24%" />
  <img src="assets/all_noise_injected.png" width="24%" />
</p>
<p align="center">
  <em>From left to right: Original signal, Level Shift corruption, Exponential Spike corruption, Combined corruption</em>
</p>

---

## Key Features

- **EVT-calibrated amplitudes** — noise magnitudes are derived from the SPOT algorithm (Streaming Peaks-over-Threshold), ensuring corruptions are statistically grounded in the tail behavior of each time series
- **Two corruption types** — level shifts (sustained deviations) and exponential spikes (transient peaks), plus their combination
- **5 severity levels** — progressively increasing frequency, duration, and amplitude for systematic robustness evaluation
- **Column-wise processing** — each column is corrupted independently with its own EVT thresholds
- **StandardScaler pipeline** — data is normalized before injection and inverse-transformed after, preserving original scale
- **pip-installable** — `pip install -e .` for instant use as a library or CLI tool
- **Supports 4 SPOT variants** — SPOT, biSPOT, dSPOT, bidSPOT for different data characteristics

---

## Installation

```bash
git clone https://github.com/dongbeank/TSRBench.git
cd TSRBench
pip install -e .
```

### Dependencies

- `numpy`, `pandas`, `scikit-learn`, `matplotlib`
- [`ads-evt`](https://pypi.org/project/ads-evt/) (>=0.0.4) — Extreme Value Theory implementation

---

## Quick Start

### Generate benchmark corruptions (one command)

```bash
# Generate all 5 severity levels for a single dataset
python -m tsrbench \
    --data-path ETTh1.csv \
    --root-path ./dataset/ETT-small/ \
    --output-path ./dataset/ETT-small/ETTh1_noise/
```

This produces 15 files (5 levels x 3 types):
```
ETTh1_level_1_type_shift.csv
ETTh1_level_1_type_spike.csv
ETTh1_level_1_type_combined.csv
...
ETTh1_level_5_type_shift.csv
ETTh1_level_5_type_spike.csv
ETTh1_level_5_type_combined.csv
```

### Python API (5 lines)

```python
from tsrbench import CollectiveNoise
import numpy as np

cn = CollectiveNoise(seed=2025)
signal = np.random.randn(10000)  # your 1D time series
shift_noise = cn.inject_level_shift(signal, noise_level=3)
corrupted = signal + shift_noise
```

### Reproduce paper benchmarks

```bash
bash scripts/generate_benchmark.sh
```

Generates corrupted data for all 6 datasets: ETTm1, ETTm2, ETTh1, ETTh2, Electricity, Weather.

---

## How It Works

TSRBench injects two types of realistic corruptions into time series data. The key innovation is using **Extreme Value Theory (EVT)** to calibrate noise amplitudes, so corruptions are proportional to the statistical extremes of each individual series.

### Pipeline Overview

```
Original CSV
    │
    ▼
StandardScaler (fit + transform)
    │
    ▼
For each column:
    ├── SPOT algorithm → EVT thresholds (upper/lower bounds)
    ├── Poisson process → anomaly occurrence times
    ├── Geometric distribution → anomaly durations
    │
    ├── Level Shift injection (sustained deviations)
    ├── Exp Spike injection (transient peaks)
    └── Combined (max of |shift|, |spike| at each point)
    │
    ▼
StandardScaler (inverse_transform)
    │
    ▼
Corrupted CSV (same format as input)
```

### Step 1: EVT Amplitude Calibration (SPOT)

The SPOT (Streaming Peaks-over-Threshold) algorithm analyzes the tail distribution of each time series column to determine realistic anomaly thresholds. Given a risk parameter `q` (the `amp` parameter), SPOT finds threshold values that would be exceeded with probability `q`.

- **For unidirectional variants** (SPOT, dSPOT): computes upper thresholds only
- **For bidirectional variants** (biSPOT, bidSPOT): computes both upper and lower thresholds, enabling both positive and negative corruptions

The EVT thresholds are computed **per-column** and **cached across severity levels** for efficiency.

### Step 2: Anomaly Occurrence (Poisson Process)

The number and location of anomalies are determined by a Poisson process:

```
N ~ Poisson(freq × T)
```

where `freq` controls the anomaly rate and `T = 2L - 1` (with `L` being the series length). A steady-state mechanism filters start points to ensure anomalies are distributed across the second half of the time window.

### Step 3: Anomaly Duration (Geometric Distribution)

Each anomaly's duration is drawn from a geometric distribution:

- **Level shift**: `d ~ Geometric(1/(dur-1)) + 1`
- **Exponential spike**: two durations `d1, d2 ~ Geometric(2/dur)` for the ascending and descending phases

### Step 4: Noise Injection

**Level Shift**: A sustained deviation where the signal is shifted by the EVT threshold value for the anomaly's duration.

**Exponential Spike**: A transient peak shaped by an exponential curve:

```
         ╱╲
        ╱  ╲
       ╱    ╲
      ╱      ╲
─────╱        ╲─────
     ← d1 →← d2 →
```

The peak height equals the EVT threshold at the peak position, and the curve decays exponentially on both sides.

**Combined**: At each time step, the corruption with the larger absolute value is selected:

```python
combined[t] = spike[t] if |spike[t]| > |shift[t]| else shift[t]
```

### Step 5: Bidirectional Noise

For bidirectional SPOT variants (biSPOT, bidSPOT), each anomaly is randomly assigned as positive (upward) or negative (downward) with equal probability, using the appropriate upper or lower threshold.

---

## Noise Parameters

The 5 default severity levels use the following parameters:

| Level | `freq` | `dur` | `amp` (SPOT q) | Description |
|:-----:|:------:|:-----:|:---------------:|:------------|
| 1 | 0.002 | 6 | 0.0016 | Minimal — rare, short, conservative amplitude |
| 2 | 0.004 | 9 | 0.0016 | Mild — more frequent, slightly longer |
| 3 | 0.004 | 12 | 0.0004 | Moderate — longer duration, more extreme amplitude |
| 4 | 0.008 | 12 | 0.0004 | Strong — frequent, long, extreme |
| 5 | 0.008 | 15 | 0.0001 | Severe — most frequent, longest, most extreme |

> **Note**: Lower `amp` values in SPOT correspond to *more extreme* thresholds (lower exceedance probability = more extreme quantile).

### Parameter Interpretation

- **`freq`**: Controls `lambda` in the Poisson process. Higher = more anomaly events.
- **`dur`**: Controls the geometric distribution parameter. Higher = longer anomalies.
- **`amp`**: The SPOT risk parameter `q`. Lower = more extreme EVT threshold = larger noise amplitude.

---

## Custom Dataset Guide

### CSV Format

Your CSV must have:
- **First column**: Timestamps or index (string/numeric, not used for injection)
- **Remaining columns**: Numeric time series values

```csv
date,temperature,humidity,pressure
2020-01-01 00:00,21.3,65.2,1013.2
2020-01-01 01:00,20.8,66.1,1013.5
...
```

### One-Command Generation

```bash
python -m tsrbench \
    --data-path my_data.csv \
    --root-path ./my_dataset/ \
    --output-path ./my_dataset/noisy/
```

Or use the generic script:

```bash
bash scripts/generate_noise.sh ./my_dataset/ my_data.csv ./my_dataset/noisy/
```

### SPOT Parameter Tuning

| Parameter | Default | When to Adjust |
|:----------|:-------:|:---------------|
| `--spot-type` | `bidspot` | Use `bispot` for short series (<1000 pts); use `dspot`/`bidspot` for non-stationary data |
| `--spot-n-points` | `8` | Increase (10-20) for noisy data; decrease (4-6) for clean data |
| `--spot-depth` | `0.01` | Increase (0.02-0.05) for highly non-stationary series |
| `--spot-init-points` | `0.05` | Increase if SPOT fails to converge; decrease for very long series |
| `--spot-init-level` | `0.98` | Lower (0.95) for more conservative thresholds |
| `--zero-clip` | `False` | Set `True` for non-negative data (e.g., electricity consumption) |

### Custom Noise Definitions

```python
from tsrbench import CollectiveNoise

# Define your own severity levels
custom_shift = {
    1: {'freq': 0.001, 'dur': 4, 'amp': 0.002},
    2: {'freq': 0.003, 'dur': 8, 'amp': 0.001},
    3: {'freq': 0.005, 'dur': 12, 'amp': 0.0005},
}
custom_spike = {
    1: {'freq': 0.001, 'dur': 4, 'amp': 0.002},
    2: {'freq': 0.003, 'dur': 8, 'amp': 0.001},
    3: {'freq': 0.005, 'dur': 12, 'amp': 0.0005},
}

cn = CollectiveNoise(
    seed=2025,
    level_shift_args=custom_shift,
    exp_spike_args=custom_spike,
)
```

---

## Visualization

TSRBench includes visualization utilities for inspecting corruptions:

```python
from tsrbench import plot_corruption_comparison, plot_severity_levels, plot_noise_only

# Side-by-side: Original | Shift | Spike | Combined
plot_corruption_comparison(
    "dataset/ETT-small/ETTh1.csv",
    "dataset/ETT-small/ETTh1_noise/",
    column="HUFL", level=3,
    save_path="figures/comparison.png"
)

# All 5 severity levels for one noise type
plot_severity_levels(
    "dataset/ETT-small/ETTh1.csv",
    "dataset/ETT-small/ETTh1_noise/",
    column="HUFL", noise_type="combined",
    save_path="figures/severity.png"
)

# Isolated noise signal (corrupted - original)
plot_noise_only(
    "dataset/ETT-small/ETTh1.csv",
    "dataset/ETT-small/ETTh1_noise/",
    column="HUFL", level=3,
    save_path="figures/noise_only.png"
)
```

See `examples/visualize_corruptions.py` for a complete example.

---

## API Reference

### `CollectiveNoise`

```python
from tsrbench import CollectiveNoise

cn = CollectiveNoise(
    seed=2025,                  # Random seed
    level_shift_args=None,      # Dict {level: {freq, dur, amp}} or None for defaults
    exp_spike_args=None,        # Dict {level: {freq, dur, amp}} or None for defaults
    spot_args=None,             # Dict {type, n_points, depth, init_points, init_level} or None for defaults
)
```

#### Methods

| Method | Description |
|:-------|:------------|
| `inject_level_shift(X, noise_level)` | Inject level shift noise into 1D signal `X` at the given severity level (1-5). Returns noise array. |
| `inject_exp_spike(X, noise_level)` | Inject exponential spike noise into 1D signal `X`. Returns noise array. |
| `inject_noise(X, noise_level)` | Inject both shift and spike noise. Returns `(shift_noise, spike_noise)`. |
| `custom_inject_level_shift(X, freq, dur, amp)` | Inject level shift with custom parameters. |
| `custom_inject_exp_spike(X, freq, dur, amp)` | Inject exponential spike with custom parameters. |
| `make_noise_datasets(args)` | Generate all corrupted CSVs from an input dataset. See CLI args for the `args` object fields. |

---

## SPOT Algorithm Variants

| Variant | Class | Handles Non-Stationarity | Bidirectional | Best For |
|:--------|:------|:------------------------|:--------------|:---------|
| SPOT | `SPOT` | No | No | Stationary, one-sided data |
| biSPOT | `biSPOT` | No | Yes | Stationary, symmetric data |
| dSPOT | `dSPOT` | Yes | No | Non-stationary, one-sided data |
| bidSPOT | `bidSPOT` | Yes | Yes | Non-stationary, symmetric data **(default)** |

- **Non-stationarity handling** (dSPOT, bidSPOT): Uses a sliding window (`depth` parameter) to adapt thresholds to local statistics
- **Bidirectional** (biSPOT, bidSPOT): Computes both upper and lower thresholds, allowing both positive and negative corruptions

---

## Data Validation

For large datasets (e.g., Electricity with 321 columns), some columns may produce extreme corruptions due to unusual distributions. TSRBench includes a validation module to detect and fix these:

```python
from tsrbench.validate import DataValidationAndRegeneration

validator = DataValidationAndRegeneration(seed=2025)

# Check for problematic columns
problems = validator.check_problematic_columns(
    data_name='electricity',
    dataset_path='./dataset/electricity/',
    level=3,
    threshold_multiplier=3
)

# Regenerate noise for problematic columns
if problems:
    validator.extract_problematic_columns('electricity', './dataset/electricity/', problems)
    validator.regenerate_noise_data('electricity2.csv', './dataset/electricity/')
```

See `tsrbench/validate.py` for the full API.

---

## Citation

If you find this repo useful for your research, please cite our paper:

```bibtex
@inproceedings{
kim2026local,
title={Local Geometry Attention for Time Series Forecasting under Realistic Corruptions},
author={Dongbin Kim and Youngjoo Park and Woojin Jeong and Jaewook Lee},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=NCQPCxN7ds}
}
```

---

## License

MIT License. See [LICENSE](LICENSE) for details.
