Metadata-Version: 2.4
Name: depth-recon
Version: 0.0.1a6
Summary: DepthDif public inference helpers for sparse ocean temperature diffusion.
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: click>=8.2.1
Requires-Dist: copernicusmarine
Requires-Dist: h5netcdf
Requires-Dist: einops
Requires-Dist: matplotlib
Requires-Dist: numpy<2.1,>=1.26
Requires-Dist: pandas
Requires-Dist: pytorch-lightning
Requires-Dist: PyYAML
Requires-Dist: rasterio
Requires-Dist: scipy
Requires-Dist: torch
Requires-Dist: tqdm
Requires-Dist: xarray

<p align="center">
  <img src="docs/assets/branding/banner_depthdif.png" width="65%" style="border-radius: 12px;" />
</p>

<p align="center">
  <a href="https://depthdif.donike.net/">
    <img src="https://img.shields.io/badge/Visit-Documentation-0b2e4f?style=for-the-badge" alt="Open Documentation" />
  </a>
  <a href="https://depthdif.donike.net/experiments/">
    <img src="https://img.shields.io/badge/Open-Experiments-0f3f68?style=for-the-badge" alt="Check Experiments" />
  </a>
</p>

# DepthDif

DepthDif is a conditional diffusion project for densifying sparse ocean temperature observations. Visit the [Documentation](https://depthdif.donike.net/) for more info on the models, datasets, and auxiliary data - or follow along with the [Experiments](https://depthdif.donike.net/experiments/).



## Installation

This project uses Python 3.12.3.

```bash
python -m pip install -r requirements.txt
```

For public inference usage, the package can also be installed in editable mode:

```bash
python -m pip install -e .
```

PyPI releases are published by GitHub Actions when a version tag such as
`v0.1.0` is pushed on `main`. The tag must match `project.version` in
`pyproject.toml`, and the PyPI project must be configured for trusted publishing
from the repository's `pypi` environment.

## Model Overview

- Model: `PixelDiffusionConditional` (conditional pixel-space diffusion with ConvNeXt U-Net denoiser).
- Active dataset: `data/dataset_argo_netcdf_gridded.py` (`ArgoNetCDFGriddedPatchDataset`) lazily builds model-ready patches from ARGO/EN4, GLORYS, OSTIA, and sea-level NetCDF files without writing patch exports.
- Optional dataset ablation: `dataset.synthetic.enabled=true` builds sparse `x` from random GLORYS `y` pixels, controlled by `dataset.synthetic.pixel_count`.
- Config layout:
  - `configs/px_space/`: active pixel-space diffusion configs
  - `configs/lat_space/`: latent-space model/training/autoencoder configs

DepthDif is a conditional diffusion model: it reconstructs dense GLORYS depth fields from sparse ARGO profile observations, conditioned on OSTIA surface SST plus coordinate/date context.

Ambient-occlusion training is available via `model.ambient_occlusion.*`: the model receives a further-corrupted sparse Argo input during training while loss is evaluated on the original `x` support intersected with valid `y` support (`x_valid_mask ∩ y_valid_mask`). With the current `x0` training preset, the model predicts the clean target on that masked support rather than the old missing-pixel region. At inference time, both standard and ambient outputs are masked back to `NaN` wherever `y_valid_mask==0`; ambient mode does not do a post-hoc overwrite with observed `x` values when `clamp_known_pixels=false`.
See `docs/ambient-occlusion-objective.md` for the full mathematical objective, figure walkthrough, and citation.
![depthdif_schema](docs/assets/figures/depthdif_schema.png)

## Training

OSTIA + Argo NetCDF training:

```bash
/work/envs/depth/bin/python train.py \
  --data-config configs/px_space/data_ostia_argo_netcdf.yaml \
  --train-config configs/px_space/training_config.yaml \
  --model-config configs/px_space/model_config.yaml
```

Ambient-occlusion objective example:

```bash
/work/envs/depth/bin/python train.py \
  --data-config configs/px_space/data_ostia_argo_netcdf.yaml \
  --train-config configs/px_space/training_config.yaml \
  --model-config configs/px_space/model_config_ambient.yaml \
  --set training.wandb.run_name=ambient_ostia_argo_netcdf_v1
```

Notes:
- `--train-config` and `--training-config` are equivalent.
- Training outputs are written under `logs/<timestamp>/` with `best.ckpt` and `last.ckpt`.
- `model.resume_checkpoint` resumes full Lightning state; `model.load_checkpoint` warm-starts by loading only model weights.
- Latent diffusion workflow configs live in `configs/lat_space/`; see `docs/autoencoder.md` for AE + latent setup and launch commands.
- Latent launcher scripts: `scripts/train_autoencoder.sh`, `scripts/train_latent_diffusion.sh`.

## Inference

Public ISO-week inference API:

```python
from depth_recon import run_week_inference

run_dir = run_week_inference(
    year=2015,
    iso_week=25,
    rectangle=(-20.0, 30.0, 10.0, 50.0),
    device="cuda",
    config_repo="simon-donike/DepthDif",
)
```

The public API downloads configs/checkpoints and the land mask from Hugging Face,
downloads EN4/ARGO and, by default, OSTIA for the selected ISO week, and returns
the GeoTIFF run directory. Pass `auto_download_ostia=False` without `ostia_dir`
to run ARGO-only inference. GLORYS is not required for the standard public
inference path; it is only needed for training or optional ground-truth
comparison exports.
OSTIA downloads use the Copernicus Marine CLI credentials configured in the
environment, or credentials passed to `run_week_inference` via
`copernicus_username` plus `copernicus_token`. The Copernicus Marine toolbox
accepts that token through its password field, so `copernicus_password` remains
supported as a backwards-compatible alias.

By default, the package uses `simon-donike/DepthDif` at revision `main`,
`model_config.yaml`, and `depthdif_v1.ckpt`.

To fetch source files separately:

```bash
depth-recon-download-argo --year 2015 --iso-week 25 --output-dir ./en4_profiles
depth-recon-download-ostia --year 2015 --iso-week 25 --output-dir ./ostia
```

Use `inference/run_single.py`:

1. Set config/checkpoint constants at the top of `inference/run_single.py` (`MODEL_CONFIG_PATH`, `DATA_CONFIG_PATH`, `TRAIN_CONFIG_PATH`, `CHECKPOINT_PATH`).
   For the active EO setup in this repository, use:
   `configs/px_space/model_config.yaml`, `configs/px_space/data_ostia_argo_netcdf.yaml`, `configs/px_space/training_config.yaml`
2. Choose `MODE` (`"dataloader"` or `"random"`).
3. Run:

```bash
/work/envs/depth/bin/python inference/run_single.py
```

For a full spatial export, use `inference/export_global.py`. It selects one exact daily snapshot from the configured patch dataset (directly or via ISO week/year), runs inference on every patch for that day, streams the accumulation to disk, and writes stitched prediction and GLORYS GeoTIFFs for Surface, 10m, 50m, 100m, 250m, 500m, 1000m, 2000m, 2500m, and 5000m under `inference/outputs/global_top_band_<YYYYMMDD>/`. Requested depths are mapped to the nearest GLORYS channel and each TIFF records both the requested and actual source depth in metadata. By default it also writes GeoJSON exports for observed Argo point locations, sampled full-profile locations with per-point graphs, and train/val patch squares. Pass `--prediction-ensemble-runs 5` to average five stochastic predictions per patch before writing the GeoTIFFs consumed by the globe packager; the default `1` keeps the existing single-run behavior.

For a pooled validation-set depth summary, use `inference/export_validation_error_summary.py`. It loads the configured dataset `val` split, runs inference across the whole split, computes per-depth median absolute error against both GLORYS and the observed ARGO values, writes `validation_error_by_depth.csv`, and saves both a single-panel error graph and a two-panel median-profile/error figure under `inference/outputs/validation_error_summary/` by default.

```bash
/work/envs/depth/bin/python inference/export_validation_error_summary.py \
  --data-config configs/px_space/data_ostia_argo_netcdf.yaml \
  --checkpoint logs/<run>/best.ckpt \
  --split val \
  --year 2015 \
  --iso-week 25 \
  --device cuda
```

To package one exported run for the Cesium globe viewer in the docs, use:

```bash
/work/envs/depth/bin/python inference/export_cesium_globe_assets.py \
  --run-dir inference/outputs/global_top_band_<YYYYMMDD> \
  --public-base-url https://<bucket-or-site>/inference_production/globe/ \
  --rclone-remote r2:<bucket>/inference_production/globe \
  --rclone-sync-scope globe
```

The globe packager tiles every exported depth level into Cesium-ready folders and uploads those tiled assets, GeoJSON, graph PNGs, and `globe-config.json` when `--rclone-sync-scope globe` is used. Raw GeoTIFFs remain local in the run directory. The standalone viewer page lives at `docs/globe/index.html` and can load a hosted `globe-config.json`.

## Experiment Script

Use `experiments.py` for quick qualitative ablations on a single dataloader sample. It loads the configured model and checkpoint, runs a few fixed conditioning cases (`eo_plus_x`, `x_only_no_eo`, `coords_date_only_no_eo_no_x`), saves comparison plots under `temp/images/`, and prints compact tensor statistics for each case.

Typical run:

```bash
/work/envs/depth/bin/python experiments.py
```

Before running, check the config and checkpoint constants at the top of `experiments.py` if you want a different model, dataset split, or checkpoint.

## Documentation

- Full documentation: `docs/` (or build/serve with MkDocs).
- Autoencoder + latent workflow guide: `docs/autoencoder.md`.
- Experiments page: `docs/experiments.md`.
