Metadata-Version: 2.4
Name: denograd
Version: 1.1.0
Summary: A model-agnostic framework for gradient-based data refinement.
Home-page: https://github.com/ari-dasci/S-noise-gradient
Author: J. Javier Alonso-Ramos
Author-email: jjalonso@ugr.es
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: torch
Requires-Dist: tqdm
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# DenoGrad: A Model-Agnostic Framework for Gradient-Based Data Refinement

[![PyPI version](https://badge.fury.io/py/denograd.svg)](https://badge.fury.io/py/denograd)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Python >= 3.6](https://img.shields.io/badge/Python-%3E%3D3.6-blue.svg)](https://www.python.org/)

**DenoGrad** is a novel, model-agnostic framework for gradient-based data refinement that leverages the representational knowledge and spectral bias of deep neural networks to correct corrupted observations. It operates within the **Data-Centric AI** paradigm, where the focus shifts from improving models to improving data.

Unlike supervised denoising approaches that require clean ground truth, DenoGrad performs **input optimization**: it freezes the weights of a pre-trained backbone model and iteratively backpropagates error corrections directly into the input space, guiding noisy samples toward regions consistent with the learned data manifold.

> **Paper:** *DenoGrad: A Model-Agnostic Framework for Gradient-Based Data Refinement*
> J. Javier Alonso-Ramos, Ignacio Aguilera-Martos, Andrés Herrera-Poyatos, Francisco Herrera
> University of Granada & DaSCI Institute

### Key Features

- **Model-Agnostic**: Works with any differentiable PyTorch backbone (MLP, LSTM, xLSTM, CNN, CNN-LSTM, Transformers, TabPFN, DLinear, etc.).
- **No Clean Ground Truth Required**: Self-supervised input optimization on the noisy dataset itself.
- **Dual Domain Support**: Specialized handling for both **Static Tabular** data and **Time-Series** forecasting (via a Consensus Strategy).
- **Joint Feature-Target Optimization**: Simultaneously refines input features $X$ and continuous targets $Y$ using jointly normalized gradients.
- **Manifold Preservation**: Achieves state-of-the-art error reduction while maintaining the highest structural fidelity, evidenced by minimal Sliced Wasserstein Distance (SWD) and maximal feature correlation consistency ($\bar{\rho}$).
- **Dataset-Level Regularizer**: Yields predictive improvements even on nominally clean datasets by mitigating latent aleatory noise.

---

## Installation

DenoGrad is available on PyPI:

```bash
pip install denograd
```

Or install the latest version from source:

```bash
git clone https://github.com/ari-dasci/S-noise-gradient.git
cd S-noise-gradient
pip install .
```

**Requirements:** Python >= 3.6, PyTorch, NumPy, tqdm

---

## Quick Start

DenoGrad integrates seamlessly into existing PyTorch pipelines. You need your (noisy) data and a model that has been trained on it.

### Static Tabular Data

```python
import torch
import torch.nn as nn
from denograd import DenoGrad

# 1. Define and train your model on the noisy data
model = nn.Sequential(
    nn.Linear(10, 64), nn.ReLU(),
    nn.Linear(64, 32), nn.ReLU(),
    nn.Linear(32, 1)
)
criterion = nn.MSELoss()
# ... train the model on X_noisy, y_noisy ...

# 2. Initialize DenoGrad (reuses the trained backbone)
denoiser = DenoGrad(model=model, criterion=criterion, device=torch.device('cuda'))

# 3. Fit and Transform
X_clean, y_clean, grad_x, grad_y = denoiser.fit_transform(
    X=X_noisy,          # numpy array (n_samples, n_features)
    y=y_noisy,          # numpy array (n_samples,) or (n_samples, n_targets)
    nrr=0.05,           # Noise Reduction Rate (η)
    nr_threshold=0.01,  # Gating threshold (τ)
    max_epochs=200
)
```

### Time-Series Forecasting (Consensus Strategy)

For time-series data, DenoGrad employs a **Consensus Strategy**. Since a single time step $t$ participates in multiple overlapping sliding windows, DenoGrad accumulates the gradient from every window context and averages them to produce a single, temporally consistent update.

```python
# 1. Initialize DenoGrad with a sequential model (e.g., LSTM)
denoiser = DenoGrad(model=lstm_model, criterion=nn.MSELoss())

# 2. Fit and Transform in Time-Series mode
X_clean, y_clean, _, _ = denoiser.fit_transform(
    X=X_ts_noisy,       # numpy array (total_timesteps, n_features)
    y=y_ts_noisy,       # numpy array (total_timesteps,)
    is_ts=True,          # Enable Time-Series mode
    window_size=24,      # Sliding window size (look-back period)
    future=1,            # Steps ahead the model predicts
    stride=1,            # Window stride
    nrr=0.01,
    nr_threshold=0.1,
    max_epochs=200
)
```

### Pandas DataFrame Support

```python
import pandas as pd

df = pd.DataFrame({"feat1": [...], "feat2": [...], "target": [...]})

X_clean, y_clean, _, _ = denoiser.fit_transform(
    X=df,
    y="target",          # Column name(s) to use as target
    nrr=0.05,
    max_epochs=100
)
```

---

## How It Works

In standard training, gradients update model weights $\theta$ to minimize loss. DenoGrad **inverts** this: it freezes $\theta$ and treats the data instances themselves as the trainable parameters.

### Core Update Rule

$$x' = x - \eta \cdot \frac{g_x}{\|[g_x, g_y]\|_2} \cdot \mathbb{I}_{\text{noisy}}, \qquad y' = y - \eta \cdot \frac{g_y}{\|[g_x, g_y]\|_2} \cdot \mathbb{I}_{\text{noisy}}$$

where $g_x = \nabla_x \mathcal{L}(f_\theta(x), y)$, $g_y = \nabla_y \mathcal{L}(f_\theta(x), y)$, and $\mathbb{I}_{\text{noisy}}$ is a binary gating mask.

### Algorithm Components

1. **Input Optimization**: Compute the gradient of the loss $\mathcal{L}$ with respect to the input features $X$ and targets $Y$ via backpropagation through the frozen model.

2. **Gating Mechanism**: A threshold $\tau$ controls noise tolerance. Gradients are zeroed for any instance where $|f_\theta(x) - y| \leq \tau$, preserving high-confidence samples and preventing over-smoothing. This retained stochasticity acts as implicit regularization.

3. **Joint Normalization**: Gradients for $X$ and $Y$ are concatenated and normalized by their joint $L_2$ norm. This ensures balanced corrections across all dimensions regardless of their scale.

4. **Consensus Strategy (Time-Series)**: For sequential data, gradient contributions from all overlapping windows covering time step $t$ are accumulated into global buffers $G_t$ with visit counters $C_t$. The final update is the averaged consensus direction:

$$x_t^{\text{new}} = x_t^{\text{old}} - \eta \cdot \frac{G_t}{C_t}$$

### Theoretical Foundation: Spectral Bias

DenoGrad exploits the well-documented **spectral bias** of neural networks: DNNs inherently prioritize learning low-frequency patterns (the true signal) over high-frequency variations (noise) during SGD training. Even when trained on noisy data, a sufficiently regularized model captures the underlying data manifold. The gradients derived from this model therefore direct noisy instances *toward* this learned manifold.

---

## API Reference

### `DenoGrad(model, criterion, device=None)`

| Parameter   | Type                      | Description                                        |
|-------------|---------------------------|----------------------------------------------------|
| `model`     | `nn.Module`               | Pre-trained PyTorch model (weights will be frozen). |
| `criterion` | `nn.modules.loss._Loss`   | Loss function (e.g., `nn.MSELoss()`).              |
| `device`    | `torch.device`, optional  | Compute device. Auto-detects CUDA if available.    |

The constructor automatically detects recurrent modules (RNN/LSTM/GRU) and sets the appropriate mode, and identifies CNN architectures for dimension handling.

### `.fit(X, y, is_ts=False, window_size=None, future=1, stride=1, flattening=False)`

Configures the internal dataset strategy without running the denoising loop.

| Parameter     | Type                  | Default | Description                                                    |
|---------------|-----------------------|---------|----------------------------------------------------------------|
| `X`           | array / Tensor / DataFrame | —       | Input features.                                                |
| `y`           | array / Tensor / str / list | —       | Targets. If `X` is a DataFrame, can be column name(s).        |
| `is_ts`       | `bool`                | `False` | Enable Time-Series mode.                                       |
| `window_size` | `int`                 | `None`  | Sliding window size (required if `is_ts=True`).                |
| `future`      | `int`                 | `1`     | Forecasting horizon (steps ahead).                             |
| `stride`      | `int`                 | `1`     | Stride between consecutive windows.                            |
| `flattening`  | `bool`                | `False` | Flatten windows into 1D vectors (useful for MLP on TS data).  |

Returns `self` for method chaining.

### `.transform(nrr=0.05, nr_threshold=0.01, max_epochs=100, denoise_y=True, batch_size=1000, save_gradients=True)`

Executes the denoising optimization loop.

| Parameter       | Type    | Default | Description                                                                 |
|-----------------|---------|---------|-----------------------------------------------------------------------------|
| `nrr`           | `float` | `0.05`  | **Noise Reduction Rate** ($\eta$). Step size for input corrections.         |
| `nr_threshold`  | `float` | `0.01`  | **Gating Threshold** ($\tau$). Instances with error $\leq \tau$ are skipped.|
| `max_epochs`    | `int`   | `100`   | Maximum optimization iterations.                                            |
| `denoise_y`     | `bool`  | `True`  | Whether to also refine the target variable $Y$.                             |
| `batch_size`    | `int`   | `1000`  | Mini-batch size for the DataLoader.                                         |
| `save_gradients`| `bool`  | `True`  | Store per-epoch gradients for analysis.                                     |

Returns `(X_denoised, y_denoised, grad_x_list, grad_y_list)`.

### `.fit_transform(X, y, ..., nrr=0.05, nr_threshold=0.01, max_epochs=100, ...)`

Convenience method combining `.fit()` and `.transform()`. Accepts all parameters from both methods.

### Hyperparameter Guidelines

Based on the empirical analysis in the paper:

| Parameter | Recommended Range | Notes |
|-----------|------------------|-------|
| `nrr` ($\eta$) | 0.01 – 0.1 | Higher rates converge faster; peak performance within ~200 iterations. |
| `nr_threshold` ($\tau$) | 0.1 | Robust baseline. Can be increased for larger aleatory margins. |
| `max_epochs` | 100 – 500 | Conservative rates (0.001) require 10x more iterations without matching performance. |

---

## Experimental Results

DenoGrad was evaluated on **10 real-world datasets** (5 tabular, 5 time-series) against 7 state-of-the-art denoising baselines (DAE, DN-ResNet, PCA, WTD, EMD, KF, MA) using diverse downstream regressors (Ridge, kNN, XGBoost, DNN, TabPFN, LSTM, xLSTM, CNN-LSTM, DLinear).

### Key Results (Friedman + Nemenyi test, $\alpha = 0.05$)

| Metric | DenoGrad Avg. Rank | Best Competitor |
|--------|-------------------|-----------------|
| **Predictive Improvement (Imp%)** | **3.10** | KF (1.50) — but with severe manifold distortion |
| **Sliced Wasserstein Distance (SWD ↓)** | **1.70** | PCA (2.30) |
| **Feature Correlation ($\bar{\rho}$ ↑)** | **2.10** | DN-ResNet (1.90) |

DenoGrad uniquely occupies the **optimal Pareto front**: it achieves top-tier predictive gains while strictly preserving the topological integrity of the data. Methods that score higher in raw Imp% (e.g., KF at 98%+) do so at the cost of massive distributional distortion (SWD > 0.5, $\bar{\rho}$ < 0.3).

### Highlights

- **ECL dataset**: 98.4% average improvement across all downstream models.
- **Microsoft Stock**: 97.6% improvement.
- **Time-Series**: The *only* method maintaining >90% improvement consistently across LSTM, xLSTM, CNN-LSTM, DLinear, and XGBoost.

### Datasets Used

| Dataset | Type | Instances | Features |
|---------|------|-----------|----------|
| House Prices | Tabular | 21,436 | 19 |
| Lattice Physics | Tabular | 24,000 | 40 |
| Parkinsons | Tabular | 5,875 | 20 |
| RT-IoT 2022 | Tabular | 117,915 | 82 |
| Support2 | Tabular | 8,579 | 33 |
| Daily Climate | Time-Series | 1,576 | 4 |
| ECL | Time-Series | 6,000 | 320 |
| ETT | Time-Series | 17,420 | 7 |
| Microsoft Stock | Time-Series | 2,192 | 5 |
| WTH | Time-Series | 35,064 | 12 |

---

## Citation

If you use DenoGrad in your research, please cite our paper:

```bibtex
@article{alonso2025denograd,
  title={DenoGrad: A Model-Agnostic Framework for Gradient-Based Data Refinement},
  author={Alonso-Ramos, J. Javier and Aguilera-Martos, Ignacio and Herrera-Poyatos, Andr{\'e}s and Herrera, Francisco},
  year={2025}
}
```

---

## Acknowledgments

This work was supported by the **University of Granada** and the **Andalusian Institute of Data Science and Computational Intelligence (DaSCI)**. It is part of the Project *"Ethical, Responsible and General Purpose Artificial Intelligence"* (IAFER) funded by the European Union Next Generation EU.

---

## License

This project is licensed under the **GNU Affero General Public License v3** — see the [LICENSE](LICENSE) file for details.
