Metadata-Version: 2.4
Name: torch-ieof
Version: 0.2.0
Summary: PyTorch TIEOF: higher-order tensor (PARAFAC/HOOI/HOSVD) DINEOF reconstruction for missing satellite data
Project-URL: Homepage, https://github.com/lukegre/torch-ieof
Project-URL: Repository, https://github.com/lukegre/torch-ieof
Project-URL: Issues, https://github.com/lukegre/torch-ieof/issues
Project-URL: Paper, https://www.mdpi.com/2073-4441/13/18/2578
Project-URL: Original implementation, https://github.com/theleokul/tieof
Author: Leonid Kulikov, Natalia Inkova, Daria Cherniuk, Anton Teslyuk, Zorigto Namsaraev
Maintainer-email: Luke Gregor <luke.gregor@sdsc.ethz.ch>, Claude Code <noreply@anthropic.com>
License-Expression: CC-BY-4.0
License-File: LICENSE
Keywords: dineof,eof,gap-filling,oceanography,parafac,pytorch,remote-sensing,satellite,tensor-decomposition,tieof,tucker
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Atmospheric Science
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Python: >=3.12
Requires-Dist: loguru>=0.7
Requires-Dist: numpy>=2
Requires-Dist: scikit-learn>=1.5
Requires-Dist: scipy>=1.13
Requires-Dist: torch>=2.2
Requires-Dist: tqdm>=4.66
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == 'dev'
Provides-Extra: examples
Requires-Dist: matplotlib>=3.8; extra == 'examples'
Requires-Dist: xarray>=2024.1; extra == 'examples'
Description-Content-Type: text/markdown

# torch-ieof: `torch`-based implementation of `tieof`

This is a slimmed-down, PyTorch-backed rewrite of the original
[TIEOF](https://www.mdpi.com/2073-4441/13/18/2578) package. All three of the
paper's higher-order tensor decompositions — **PARAFAC** (CP-ALS), **HOOI**
(iterative Tucker), and **TruncHOSVD** (closed-form Tucker) — are re-implemented
in pure PyTorch with no `tensorly`, no `oct2py`, and no `ray`. A classical
2-D `DINEOF` estimator (iterative truncated SVD) is also included for
comparison. Runs on `numpy>=2` / `scikit-learn>=1.5` / `torch>=2.2` under Python
3.12+. The legacy GHER Fortran bridge, CLI scripts, and `interpolator/` package
were dropped (they live on the `main_backup` branch).

> 📖 **Richer docs:** open [`README.html`](README.html) for a tabbed reference
> that summarises the three decomposition methods and the DINEOF reconstruction
> loop as described in the paper. 

## Install

```bash
pip install -e .            # or:  uv pip install -e .
pip install -e ".[dev]"     # with pytest
```

Or with `uv`
```bash
uv add torch-ieof
```

Dependencies: `numpy>=2`, `scipy>=1.13`, `scikit-learn>=1.5`, `torch>=2.2`,
`tqdm`, `loguru`.

## Usage

### Raw numpy tensor

```python
import numpy as np
from torch_ieof import DINEOF3

shape = (n_lat, n_lon, n_time)
tensor = ...                # np.ndarray, NaN for missing values
model = DINEOF3(R=3, tensor_shape=shape)   # decomp_type="parafac" (default)
model._fit(tensor)
filled = model.reconstructed_tensor
```

### Choosing a decomposition (`decomp_type`)

`DINEOF3` exposes all three engines from the paper. Pick with `decomp_type`:

```python
DINEOF3(R=3, tensor_shape=shape, decomp_type="parafac")      # CP-ALS (default)
DINEOF3(R=(4, 4, 6), tensor_shape=shape, decomp_type="hooi") # iterative Tucker
DINEOF3(R=4, tensor_shape=shape, decomp_type="trunchosvd")   # closed-form Tucker
```

- `parafac` — CP/PARAFAC via alternating least squares; `R` is a single
  integer rank shared across all modes. Most parsimonious and interpretable.
- `hooi` — Higher-Order Orthogonal Iteration (Tucker-ALS). `R` may be an int
  (broadcast to every mode) or a per-mode tuple `(R_lat, R_lon, R_time)`.
- `trunchosvd` — truncated HOSVD: the closed-form Tucker initialiser. Cheapest,
  but lower quality (no ALS refinement). `R` as for `hooi`.

In the paper the three variants perform within each other's confidence
intervals — the gain over classical DINEOF comes from working in the full
3-D feature space, not from the specific decomposition.

After fitting, `model.predict_rank(k)` reconstructs the tensor using only the
first `k` components (an int for PARAFAC, an int or per-mode tuple for Tucker).

### xarray DataArray (single variable)

```python
from torch_ieof import reconstruct_dataarray

rec, model = reconstruct_dataarray(
    sst_da, R=3,
    lat_dim="lat", lon_dim="lon", time_dim="time",
    mask=land_mask,            # optional xr.DataArray or ndarray (lat, lon) bool
    to_center=True, nitemax=80, toliter=1e-4,
)
# rec has the same dim order, coords, and attrs as sst_da.
```

### xarray Dataset — multivariate joint reconstruction

When one variable has heavy cloud cover but a related variable (sharing
temporal dynamics) is more complete, jointly reconstructing them couples
the variables through a shared temporal factor (Alvera-Azcárate-style
multivariate DINEOF). Cloud gaps in the sparse variable are constrained
by simultaneous observations in the others.

```python
from torch_ieof import reconstruct_dataset

ds_recon, model = reconstruct_dataset(
    ds,                        # xr.Dataset of vars sharing (lat, lon, time)
    R=2 * 3,                   # rank — typically larger than per-var rank
    variables=["sst", "chl"],  # which vars to couple (default: all)
    masks={"sst": land_mask},  # optional per-var masks
    nitemax=120, toliter=1e-6,
)
```

Each variable is z-scored before stacking along the latitude axis, fit
with a single CP decomposition, then split back and denormalised.
`to_center=False` is the default for the joint path (z-scoring handles
centering).

### Classical 2-D DINEOF (truncated-SVD baseline)

For comparison, `DINEOF` implements the original 2-D method (Beckers & Rixen
2003): restrict to valid ocean pixels, reshape to a `(space, time)` matrix,
and alternate truncated-SVD reconstruction with re-imputation of the gaps.

```python
from torch_ieof import DINEOF

model = DINEOF(R=5, tensor_shape=shape, mask=ocean_mask)
model._fit(tensor)               # same NaN-for-missing convention as DINEOF3
filled = model.reconstructed_tensor
```

Same sklearn-style `fit`/`predict`/`score` API and `predict_rank(k)` helper as
`DINEOF3`.

### sklearn-style fit/predict

```python
# X: (N, 3) integer coords (lat_idx, lon_idx, t_idx)
# y: (N,) values (NaN allowed for missing points)
model = DINEOF3(R=3, tensor_shape=(n_lat, n_lon, n_time))
model.fit(X, y)
y_pred = model.predict(X_test)
score = model.score(X_test, y_test)   # negative NRMSE
```

Key options:

- `R` — rank. Int (CP rank, or broadcast Tucker rank) or per-mode tuple for
  Tucker variants.
- `decomp_type` — `"parafac"` (default), `"hooi"`, or `"trunchosvd"`.
- `mask` — `(n_lat, n_lon)` boolean array (or `.npy` path). `True` = inside the
  investigated area. Cells outside are zeroed during fitting and set to NaN in
  the output.
- `to_center` / `lat_lon_sep_centering` — centre the tensor before fitting.
- `keep_non_negative_only` — clamp negative reconstructed values to 0.
- `early_stopping` — stop on absolute **or** gradient convergence (the paper's
  early-stopping mode); else stop on absolute error only.
- `nitemax`, `toliter` — outer reconstruction loop budget / tolerance.
- `td_iter_max`, `tol` — inner decomposition (CP-ALS / HOOI) budget / tolerance.

## Performance, logging & progress

The hot loop runs in PyTorch. By default the device is `cuda` if available,
otherwise CPU torch (MPS is skipped automatically because per-iter CPU
fallbacks for SVD / `pinv` make it slower than CPU for typical CP ranks).
Override with `device="mps"` for very large tensors, or `dtype=torch.float64`
for tighter numerics. Default dtype is `float32`.

Each fit logs a single INFO line at the start (shape, R, device, dtype,
missing fraction) and one at the end (iters, final error). Iteration
progress is shown via a live `tqdm` bar with `err` and `Δerr` in the
postfix. Disable the bar with `progress=False` if you're piping output to
a file or running in CI:

```python
DINEOF3(R=4, tensor_shape=shape, progress=False)
reconstruct_dataarray(da, R=4, progress=False)
```

To suppress the INFO logs too:

```python
from loguru import logger
logger.disable("torch_ieof")
```

## Tests

```bash
pytest -q
```

Covers Kolda-convention unfolding, Khatri-Rao, CP reconstruction, end-to-end
recovery of a noisy synthetic rank-R tensor, and a cloud-like patchy-mask
case with a moving spatial feature.

## Examples

Install the extra dependencies (`matplotlib`, `xarray`) and run:

```bash
pip install -e ".[examples]"
python examples/timeseries_demo.py   # SST-like field, cloud-blob gaps, saves PNG
python examples/xarray_workflow.py   # round-trip via xarray.DataArray
```

`examples/xarray_workflow.py` includes a small `reconstruct_dataarray()` helper
showing how to wrap DINEOF3 for an `xarray.DataArray` with arbitrary dim order —
useful as a template for plugging into an existing oceanographic workflow.

## Credit, citing & license

The TIEOF method and its original implementation are the work of **Kulikov,
Inkova, Cherniuk, Teslyuk & Namsaraev**. This package is a PyTorch repackaging;
all scientific credit belongs to them. If you use this in academic work, please
cite the original paper:

> Kulikov, L.; Inkova, N.; Cherniuk, D.; Teslyuk, A.; Namsaraev, Z. *TIEOF:
> Algorithm for Recovery of Missing Multidimensional Satellite Data on Water
> Bodies Based on Higher-Order Tensor Decompositions.* Water 2021, 13(18), 2578.
> <https://doi.org/10.3390/w13182578>

Original code: <https://github.com/theleokul/tieof>.

Licensed under **CC BY 4.0** — free to use, share and adapt with attribution to
the authors above. See [`LICENSE`](LICENSE).
