Metadata-Version: 2.4
Name: deep-ensembles-pytorch
Version: 0.1.2
Summary: Deep Ensembles - Pytorch
Home-page: https://github.com/lucidrains/deep-ensembles-pytorch
Author: MohamedFarag21
Author-email: mibrahi2@uni-bonn.de
License: MIT
Keywords: artificial intelligence,deep learning,uncertainty estimation,ensemble methods,bayesian deep learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.6
Description-Content-Type: text/markdown
Requires-Dist: accelerate
Requires-Dist: einops>=0.7
Requires-Dist: ema-pytorch>=0.4.2
Requires-Dist: torch>=2.0
Requires-Dist: tqdm
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: requires-dist
Dynamic: summary

## Deep Ensembles, in Pytorch

Implementation of [Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles](https://arxiv.org/abs/1612.01474) (NeurIPS 2017) in Pytorch.

A dead-simple, non-Bayesian approach to uncertainty quantification: train M networks independently and aggregate their predictions as a mixture of Gaussians. Matches or beats approximate Bayesian methods at a fraction of the complexity.

## Install

```bash
$ pip install deep-ensembles-pytorch
```

## Usage

### Regression with Uncertainty

```python
import torch
from deep_ensembles_pytorch import DeepEnsemble, TensorDataset, Trainer

# build ensemble of 5 members — paper default
ensemble = DeepEnsemble(
    dim_in = 1,
    dim_out = 1,
    num_members = 5,
    hidden_dim = 100,
    depth = 3,
    use_adversarial_training = True,   # FGSM smoothing (Sec. 3)
    adversarial_eps = 0.01,
)

# training
X = torch.randn(1000, 1)
y = X.sin() + 0.1 * torch.randn_like(X)

loss = ensemble(X, y)
loss.backward()

# inference — returns mixture-of-Gaussians aggregate
pred = ensemble(X)
pred.mean      # (B, 1)  — predictive mean
pred.variance  # (B, 1)  — predictive variance (epistemic + aleatoric)
```

### Classification with Uncertainty

```python
from deep_ensembles_pytorch import DeepEnsembleClassifier

classifier = DeepEnsembleClassifier(
    dim_in = 784,
    num_classes = 10,
    num_members = 5,
    hidden_dim = 200,
    depth = 3,
)

X = torch.randn(32, 784)
target = torch.randint(0, 10, (32,))

loss = classifier(X, target)
loss.backward()

# inference
pred = classifier(X)
pred.probs     # (B, C)  — ensemble-averaged softmax
pred.variance  # (B, 1)  — predictive entropy (higher = more uncertain)
```

### Full Training with `Trainer`

```python
import torch
from deep_ensembles_pytorch import DeepEnsemble, TensorDataset, Trainer

ensemble = DeepEnsemble(
    dim_in = 13,   # e.g. Boston Housing
    dim_out = 1,
    num_members = 5,
    hidden_dim = 100,
    depth = 3,
)

X = torch.randn(500, 13)
y = torch.randn(500, 1)
dataset = TensorDataset(X, y)

trainer = Trainer(
    ensemble,
    dataset,
    train_batch_size = 100,
    train_lr = 1e-3,
    train_num_steps = 10_000,
    ema_decay = 0.995,
    amp = False,
)

trainer.train()
# checkpoints saved to ./results/
```

### Multi-GPU

```bash
$ accelerate config
$ accelerate launch train.py
```

### Accessing Individual Member Predictions

```python
# useful for visualising the ensemble spread
member_preds = ensemble.sample_predictions(X)

for i, pred in enumerate(member_preds):
    print(f'member {i}: μ={pred.mean[:3]}, σ²={pred.log_var.exp()[:3]}')
```

## How It Works

Three design choices make Deep Ensembles both simple and powerful:

1. **Proper scoring rule** — each member minimises the Gaussian NLL rather than MSE, forcing the network to predict both mean *and* variance:

   ```
   L(θ) = 0.5 · [log σ²_θ(x) + (y − μ_θ(x))² / σ²_θ(x)]
   ```

2. **Random initialisation diversity** — members are independently initialised; no shared weights, no shared data subsets (unlike Bagging).

3. **Mixture-of-Gaussians aggregation** — at inference the ensemble forms a richer predictive distribution than any single member:

   ```
   μ* = (1/M) Σ μ_m
   σ*² = (1/M) Σ (σ²_m + μ²_m) − μ*²
   ```

An optional FGSM adversarial step smooths each member's predictive distribution during training, shown to improve calibration in the paper.

## Citation

```bibtex
@article{lakshminarayanan2017simple,
    title   = {Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles},
    author  = {Balaji Lakshminarayanan and Alexander Pritzel and Charles Blundell},
    journal = {Advances in Neural Information Processing Systems},
    year    = {2017},
    url     = {https://arxiv.org/abs/1612.01474}
}
```
