Metadata-Version: 2.4
Name: ladam
Version: 0.4.0
Summary: LAdam: Laplacian Adam optimizer + physics-inspired ML toolkit (WaveNorm, ChiAnnealScheduler, neuron reordering)
Project-URL: Homepage, https://github.com/gpartin/ladam
Project-URL: Documentation, https://github.com/gpartin/ladam#usage
Project-URL: Issues, https://github.com/gpartin/ladam/issues
Author: Greg Partin
License: MIT
License-File: LICENSE
Keywords: adam,batch-norm-alternative,deep-learning,laplacian,normalization,optimizer,physics-inspired,pinn,pytorch,scheduler,scientific-ml,spatial-regularization,wave-equation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.8
Requires-Dist: torch>=1.10.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# LAdam

**Laplacian Adam optimizer + physics-inspired ML toolkit for PyTorch**

[![CI](https://github.com/gpartin/ladam/actions/workflows/ci.yml/badge.svg)](https://github.com/gpartin/ladam/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/ladam.svg)](https://pypi.org/project/ladam/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org)

LAdam is a drop-in Adam replacement that applies **discrete Laplacian regularization** to Adam's second-moment estimate, plus a toolkit of physics-inspired components for deep learning:

- **LAdam / LAdaGrad / LRMSProp** -- Spatially-coupled adaptive optimizers
- **ChiAnnealScheduler** -- 3-phase learning rate schedule (warmup -> constant -> cosine-squared decay)
- **WaveNorm / WaveNormDamped** -- Wave-equation normalization layers (BatchNorm alternative)
- **ChiNorm** -- Adaptive chi-field normalization
- **Neuron Reordering** -- Correlation-based neuron permutation for MLPs

## Installation

```bash
pip install ladam
```

## Quick Start

### Optimizer (drop-in Adam replacement)

```python
from ladam import LAdam

optimizer = LAdam(model.parameters(), lr=1e-3, c2=1e-4)

for batch in dataloader:
    loss = criterion(model(batch))
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
```

### Full toolkit (optimizer + scheduler + normalization)

```python
from ladam import LAdam, ChiAnnealScheduler, WaveNorm
import torch.nn as nn

# Build model with WaveNorm instead of BatchNorm
model = nn.Sequential(
    nn.Conv2d(1, 32, 3, padding=1),
    WaveNorm(32),            # <-- replaces nn.BatchNorm2d(32)
    nn.ReLU(),
    nn.AdaptiveAvgPool2d(1),
    nn.Flatten(),
    nn.Linear(32, 10),
)

# Optimizer + scheduler
optimizer = LAdam(model.parameters(), lr=1e-3, c2=1e-4)
scheduler = ChiAnnealScheduler(optimizer, total_steps=10000)

for step in range(10000):
    loss = train_step(model, batch)
    optimizer.step()
    scheduler.step()
```

## Benchmark Results (64-Experiment Campaign)

Validated across **55+ experiments** spanning 8 task domains, 3 seeds each, p-values computed.

### Head-to-Head: LAdam vs Adam

| Task | Architecture | Metric | Adam | LAdam | Delta | Verdict |
|------|-------------|--------|------|-------|-------|---------|
| **Wave Eq. PINN** | 5x128 MLP | L2 Error | 0.0310 | **0.0172** | **-44.6%** | WIN |
| **FashionMNIST** | MLP+Chi+Reorder | Accuracy | 89.76% | **90.56%** | **+0.80%** | WIN |
| **CIFAR-10** | ResNet-18 | Accuracy | 67.96% | **73.39%** | **+5.43%** | WIN |
| **FashionMNIST** | Transformer | Accuracy | 89.46% | **89.66%** | **+0.20%** | WIN |
| **VOC 2012** | DeepLabV3 | mIoU | 5.40% | **5.61%** | **+0.21%** | WIN |
| **Rosenbrock** | Optimization | H0 Reject | -- | **REJECTED** | -- | WIN |
| Detection | Faster-RCNN | mAP | **74.01%** | 71.40% | -2.61% | LOSS |
| Audio | VGGish | Accuracy | **65.12%** | 64.26% | -0.86% | LOSS |
| Diffusion | U-Net | FID proxy | **0.0340** | 0.0349 | +2.7% | LOSS |
| GPT-2 | Transformer | Perplexity | **152** | 1098 | -- | LOSS |

**Win rate on decisive results: 10/22 = 45%**

### Head-to-Head: WaveNorm vs BatchNorm

| Task | Architecture | Metric | BatchNorm | WaveNorm | Delta | Verdict |
|------|-------------|--------|-----------|----------|-------|---------|
| **FashionMNIST** | CNN | Accuracy | 91.18% | **91.94%** | **+0.76%** | WIN |
| **SVHN** | CNN | Accuracy | 91.63% | **92.20%** | **+0.57%** | WIN |
| **CalHousing** | MLP | MSE | 0.213 | **0.184** | **-13.5%** | WIN |
| WineQuality | MLP | MSE | **0.668** | 0.663 | -0.8% | TIE |

### Where LAdam Excels

| Domain | Why It Works | Recommended c2 |
|--------|-------------|----------------|
| **PINNs / Scientific ML** | PDE losses have spatial structure in weight space | `1e-5` |
| **Transformers** | Attention heads have correlated geometry | `1e-4` |
| **MLPs + neuron reorder** | Reordering creates spatial structure for Laplacian | `3e-4` |
| **ResNets** | Channel Laplacian smooths correlated filters | `3e-4` |

### Where Adam Wins (use Adam instead)

| Domain | Why | Recommendation |
|--------|-----|----------------|
| **LLM fine-tuning** | Embedding layers need per-token specialization | Use Adam/AdamW |
| **Diffusion models** | U-Net denoising is already well-conditioned | Use Adam |
| **RL / Policy gradients** | High-variance gradients overwhelm spatial coupling | Use Adam |
| **Audio classification** | Non-spatial features in spectrograms | Use Adam |

## Auto-Configuration (New in v0.4.0)

Automatically configure LAdam with optimal per-layer c2 values based on your model architecture:

```python
from ladam import auto_configure, analyze_model
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

# One-line setup — returns (optimizer, scheduler_or_None)
optimizer, scheduler = auto_configure(model, lr=1e-3, total_steps=10000)

# Or just analyze without creating an optimizer
report = analyze_model(model)
print(report)
# {'architecture': 'mlp', 'total_params': 203530, 'recommendation': 'good_fit',
#  'suitable_pct': 99.9, 'c2_map': {'linear': 1e-05, 'conv': 0.0, ...}}
```

`auto_configure` assigns per-layer-type c2 values derived from the 64-experiment benchmark:
- **Linear layers**: `c2=1e-5` (biggest wins in PINNs and MLPs)
- **Conv/Attention/Norm/Embedding**: `c2=0` (Laplacian coupling not beneficial)
- Returns a `ChiAnnealScheduler` when `total_steps > 0`

## Components

### Optimizers

| Optimizer | Base | Laplacian target | Best for |
|-----------|------|------------------|----------|
| **LAdam** | Adam | Second moment v_t | PINNs, transformers, CNNs |
| **LAdaGrad** | AdaGrad | Cumulative sum G_t | Sparse features, NLP |
| **LRMSProp** | RMSProp | Running average v_t | RNNs, non-stationary losses |

```python
from ladam import LAdam, LAdaGrad, LRMSProp, suggest_c2

optimizer = LAdam(model.parameters(), lr=1e-3, c2=suggest_c2('pinn'))
```

### ChiAnnealScheduler

3-phase learning rate schedule inspired by chi-field annealing dynamics:

1. **Warmup** (5%): Linear ramp from 0 to base_lr
2. **Constant** (65%): Full learning rate for main training
3. **Settle** (30%): Cosine-squared decay to 0

```python
from ladam import LAdam, ChiAnnealScheduler

optimizer = LAdam(model.parameters(), lr=1e-3)
scheduler = ChiAnnealScheduler(
    optimizer,
    total_steps=len(dataloader) * num_epochs,
    warmup_frac=0.05,
    settle_frac=0.30,
)

for epoch in range(num_epochs):
    for batch in dataloader:
        loss = criterion(model(batch))
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
```

### WaveNorm / WaveNormDamped

Drop-in BatchNorm replacements that normalize via wave-equation evolution:

```python
from ladam import WaveNorm, WaveNormDamped

# Undamped -- best for regression tasks
norm = WaveNorm(num_features=64, n_steps=3)

# Damped -- fixes oscillation for classification with bounded outputs
norm = WaveNormDamped(num_features=64, n_steps=3, init_damping=0.5)
```

**WaveNormDamped** adds learnable per-feature damping. It automatically learns:
- Low damping for features that benefit from wave momentum (regression)
- High damping for features that need to settle (bounded classification)

### ChiNorm

Adaptive normalization based on local energy density:

```python
from ladam import ChiNorm

norm = ChiNorm(num_features=64, chi0_init=1.0, g_init=0.1)
```

### Neuron Reordering

Reorder MLP neurons so that correlated neurons are adjacent, then LAdam's
Laplacian can exploit the spatial structure:

```python
from ladam import LAdam, compute_neuron_order, reorder_linear_layer

# 1. Train with Adam briefly to get meaningful activations
# 2. Collect hidden-layer activations
activations = collect_hidden_activations(model, dataloader)

# 3. Compute optimal ordering (greedy nearest-neighbor TSP)
order = compute_neuron_order(activations)

# 4. Reorder weights (network output is unchanged)
reorder_linear_layer(model.fc1, model.fc2, order)

# 5. Switch to LAdam for the rest of training
optimizer = LAdam(model.parameters(), lr=1e-3, c2=3e-4)
```

## Parameters

### LAdam

| Parameter | Default | Description |
|-----------|---------|-------------|
| `lr` | 1e-3 | Learning rate |
| `betas` | (0.9, 0.999) | EMA coefficients |
| `eps` | 1e-8 | Numerical stability |
| `weight_decay` | 0 | L2 regularization |
| `c2` | 1e-4 | **Laplacian coupling strength** |
| `mode` | 'variance_lap' | Which quantity to smooth |
| `stencil` | '9point' | Laplacian stencil type |

### Choosing c2

| c2 | Best For | Notes |
|----|----------|-------|
| `1e-5` | PINNs, scientific ML | Gentle coupling, biggest error reduction |
| `1e-4` | Transformers, general | **Safe default** |
| `3e-4` | MLPs with reorder, ResNets | Stronger coupling for structured weights |
| `0` | Disable | Falls back to standard Adam |

## How It Works

Standard Adam computes per-parameter adaptive learning rates from the second moment:

```
v_t = beta2 * v_{t-1} + (1 - beta2) * g_t^2
lr_effective = lr / (sqrt(v_t) + eps)
```

LAdam adds a Laplacian coupling step:

```
v_smooth = v_t + c2 * laplacian(v_t)
lr_effective = lr / (sqrt(v_smooth) + eps)
```

The discrete Laplacian is computed via a single `F.conv2d` kernel (9-point isotropic by default) -- efficient and GPU-friendly. Overhead: **~2-5% wall-clock time** per step.

## FAQ

**Q: Does this work for LLMs?**
A: No. LAdam hurts LLM training -- embedding layers need per-token specialization that the Laplacian destroys. Use Adam/AdamW.

**Q: How does WaveNorm compare to BatchNorm?**
A: WaveNorm beats BN on 3/4 tested tasks, with the biggest win on regression (-13.5% MSE). Use WaveNormDamped for classification.

**Q: What's the overhead?**
A: LAdam adds ~2-5% wall-clock time (single fused conv kernel). WaveNorm adds ~10-15% (3 leapfrog steps).

**Q: Why not smooth the gradient instead of the variance?**
A: [Osher et al. (2018)](https://arxiv.org/abs/1806.06317) explored Laplacian smoothing of gradients. We found that smoothing the *variance estimate* is more effective because it smooths the *learning rate landscape* rather than the *descent direction*.

## Citation

```bibtex
@software{partin2026ladam,
  author = {Partin, Greg},
  title = {LAdam: Spatially-Aware Adaptive Optimization via Laplacian-Regularized Variance Estimates},
  year = {2026},
  url = {https://github.com/gpartin/ladam}
}
```

## Support & Sponsorship

LAdam is **free and open-source** (MIT license). If it helps your research or saves you training time, consider supporting development:

| Tier | Price | What You Get |
|------|-------|--------------|
| **Community** | Free | Full library, all features, GitHub Issues |
| **Pro Sponsor** | $29/mo | Priority bug fixes, architecture review, direct support |
| **Enterprise** | Custom | Consulting, custom integration, model optimization reports |

[![Sponsor](https://img.shields.io/badge/Sponsor-%E2%9D%A4-red?logo=github)](https://github.com/sponsors/gpartin)
[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-tip-yellow?logo=buy-me-a-coffee)](https://buymeacoffee.com/emergentphysicslab)

## License

MIT. See [LICENSE](LICENSE) for details.
