Metadata-Version: 2.4
Name: toploss
Version: 0.1.0
Summary: Topological Loss Engineering: differentiable, optimizer-free regularizers that embed 2024-2026 optimizer breakthroughs (Muon/XSAM/CWD/AdEMAMix/NTKMTL/SymNoise) directly into the loss.
Author: Rishabh A. Patil
License: MIT
Project-URL: Homepage, https://github.com/MrRobotop/toploss
Project-URL: Documentation, https://github.com/MrRobotop/toploss#readme
Project-URL: Source, https://github.com/MrRobotop/toploss
Keywords: deep-learning,optimization,regularization,loss-functions,sharpness-aware,weight-decay,multi-task-learning,pytorch
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.12
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: numpy; extra == "dev"
Requires-Dist: matplotlib; extra == "dev"
Dynamic: license-file

# toploss: Topological Loss Engineering for PyTorch

> Differentiable, **optimizer-free** regularizers that embed the 2024–2026 wave of
> optimizer breakthroughs (Muon/XSAM, Cautious Weight Decay, AdEMAMix, NTKMTL,
> SymNoise) *directly into the loss function*.

Modern training recipes push networks toward flat, well-conditioned minima by
modifying the **optimizer** (SAM, Muon, CWD, …). `toploss` takes the opposite,
much more portable route: it expresses each of those topological constraints as a
**plain penalty added to the loss**, so a vanilla `SGD`/`Adam` reproduces the
behaviour with no custom optimizer code.

| Regularizer | Inspiration | One-line idea |
|---|---|---|
| **SASP** | SAM / XSAM | penalize the closed-form empirical-Fisher trace of the head, flattening curvature with **no extra backward pass** |
| **CVP**  | Cautious Weight Decay | sigmoid-gated weight decay (stop-grad) → sliding-mode volume control |
| **NGER** | NTKMTL + Excess Risk | softmax task weights `∝ excess_risk / ntk_eigenvalue^γ` |
| **SBMP** | NEFTune / SymNoise | KL consistency under symmetric-Bernoulli embedding noise |
| **DMTA** | AdEMAMix | align the current gradient with a slow-EMA descent direction |

## Install

```bash
pip install toploss          # from PyPI (once published)
# or, from source:
pip install -e .
```

## Quickstart

```python
import torch, torch.nn.functional as F
from toploss import TopologicalLoss

crit = TopologicalLoss(lambda_sasp=1e-2, lambda_cvp=1e-2)

logits, feats = model.forward_with_features(x)   # head logits + head inputs h
base = F.cross_entropy(logits, y)
loss = crit(base_loss=base, logits=logits, features=feats, targets=y,
            params=model.parameters())           # grads read from p.grad
loss.backward()
optimizer.step()                                  # any optimizer
```

### Individual pieces

```python
from toploss import SASPLoss, CVPRegularizer, NGERWeighter, SBMPLoss, DMTATracker

sasp = SASPLoss(rho=1e-2)
loss = F.cross_entropy(logits, y) + sasp(logits, feats, y)

cvp = CVPRegularizer(lam=1e-2, beta=50.0)
loss = loss + cvp(model.parameters())            # call after loss.backward()-free fwd

weighter = NGERWeighter(num_tasks=3, gamma=1.0)  # multi-task
mtl_loss = weighter([l1, l2, l3], [k1, k2, k3])  # k_t = ||grad_shared l_t||_F^2
```

## Why it works (the correspondence principle)

For an optimizer update `u = -η (g + c(θ))`, the discrete trajectory is, to first
order, gradient flow on `L(θ) + Φ(θ)` with `∇Φ = c`. Each `toploss` penalty supplies
exactly that `c`, so the *loss* now carries the topological bias that previously lived
in the optimizer. See the accompanying paper for the full derivations, proofs, and
experiments.

## Tests

```bash
PYTHONPATH=src pytest tests/ -q
```

## License

MIT.
