Metadata-Version: 2.2
Name: torchzero
Version: 0.1.8
Summary: Modular optimization library for PyTorch.
Author-email: Ivan Nikishev <nkshv2@gmail.com>
License: MIT License
        
        Copyright (c) 2024 inikishev
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/inikishev/torchzero
Project-URL: Repository, https://github.com/inikishev/torchzero
Project-URL: Issues, https://github.com/inikishev/torchzero/isses
Keywords: optimization,optimizers,torch,neural networks,zeroth order,second order
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch
Requires-Dist: numpy
Requires-Dist: typing_extensions

![example workflow](https://github.com/inikishev/torchzero/actions/workflows/tests.yml/badge.svg)

# torchzero

`torchzero` implements a large number of chainable optimization modules that can be chained together to create custom optimizers:

```py
import torchzero as tz

optimizer = tz.Modular(
    model.parameters(),
    tz.m.Adam(),
    tz.m.Cautious(),
    tz.m.LR(1e-3),
    tz.m.WeightDecay(1e-4)
)

# standard training loop
for batch in dataset:
    preds = model(batch)
    loss = criterion(preds)
    optimizer.zero_grad()
    optimizer.step()
```

Each module takes the output of the previous module and applies a further transformation. This modular design avoids redundant code, such as reimplementing cautioning, orthogonalization, laplacian smoothing, etc for every optimizer. It is also easy to experiment with grafting, interpolation between different optimizers, and perhaps some weirder combinations like nested momentum.

Modules are not limited to gradient transformations. They can perform other operations like line searches, exponential moving average (EMA) and stochastic weight averaging (SWA), gradient accumulation, gradient approximation, and more.

There are over 100 modules, all accessible within the `tz.m` namespace. For example, the Adam update rule is available as `tz.m.Adam`. Complete list of modules is available in [documentation](https://torchzero.readthedocs.io/en/latest/autoapi/torchzero/modules/index.html).

## Closure

Some modules and optimizers in torchzero, particularly line-search methods and gradient approximation modules, require a closure function. This is similar to how `torch.optim.LBFGS` works in PyTorch. In torchzero, closure needs to accept a boolean backward argument (though the argument can have any name). When `backward=True`, the closure should zero out old gradients using `opt.zero_grad()`, and compute new gradients using `loss.backward()`.

```py
def closure(backward = True):
    preds = model(inputs)
    loss = loss_fn(preds, targets)

    if backward:
        optimizer.zero_grad()
        loss.backward()
    return loss

optimizer.step(closure)
```

If you intend to use gradient-free methods, `backward` argument is still required in the closure. Simply leave it unused. Gradient-free and gradient approximation methods always call closure with `backward=False`.

All built-in pytorch optimizers, as well as most custom ones, support closure too. So the code above will work with all other optimizers out of the box, and you can switch between different optimizers without rewriting your training loop.

# Documentation

For more information on how to create, use and extend torchzero modules, please refer to the documentation at [torchzero.readthedocs.io](https://torchzero.readthedocs.io/en/latest/index.html).

# Extra

Some other optimization related things in torchzero:

### scipy.optimize.minimize wrapper

scipy.optimize.minimize wrapper with support for both gradient and hessian via batched autograd

```py
from torchzero.optim.wrappers.scipy import ScipyMinimize
opt = ScipyMinimize(model.parameters(), method = 'trust-krylov')
```

Use as any other closure-based optimizer, but make sure closure accepts `backward` argument. Note that it performs full minimization on each step.

### Nevergrad wrapper

[Nevergrad](https://github.com/facebookresearch/nevergrad) is an optimization library by facebook with an insane number of gradient free methods.

```py
from torchzero.optim.wrappers.nevergrad import NevergradOptimizer
opt = NevergradOptimizer(bench.parameters(), ng.optimizers.NGOptBase, budget = 1000)
```

Use as any other closure-based optimizer, but make sure closure accepts `backward` argument.

### NLopt wrapper

[NLopt](https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/) is another optimization library similar to scipy.optimize.minimize, with a large number of both gradient based and gradient free methods.

```py
from torchzero.optim.wrappers.nlopt import NLOptOptimizer
opt = NLOptOptimizer(bench.parameters(), 'LD_TNEWTON_PRECOND_RESTART', maxeval = 1000)
```

Use as any other closure-based optimizer, but make sure closure accepts `backward` argument. Note that it performs full minimization on each step.
