SoLU
Softmax Linear Unit (SoLU) - A PyTorch implementation.
This package implements the Softmax Linear Unit activation function as described in https://www.anthropic.com/research/softmax-linear-units. SoLU applies a softmax operation element-wise with the input, creating a unique activation pattern that has been shown to improve training dynamics in certain neural network architectures.
Example:
>>> import torch >>> from SoLU import SoLU, SoLULayer >>> x = torch.randn(2, 5, 4) >>> solu = SoLU() >>> output = solu(x) >>> layer = SoLULayer(hidden_size=4) >>> output = layer(x)
1"""Softmax Linear Unit (SoLU) - A PyTorch implementation. 2 3This package implements the Softmax Linear Unit activation function 4as described in https://www.anthropic.com/research/softmax-linear-units. SoLU applies 5a softmax operation element-wise with the input, creating a unique 6activation pattern that has been shown to improve training dynamics 7in certain neural network architectures. 8 9Example: 10 >>> import torch 11 >>> from SoLU import SoLU, SoLULayer 12 >>> x = torch.randn(2, 5, 4) 13 >>> solu = SoLU() 14 >>> output = solu(x) 15 >>> layer = SoLULayer(hidden_size=4) 16 >>> output = layer(x) 17""" 18 19from .module import SoLU 20from .layers import SoLULayer 21 22__all__ = ["SoLU", "SoLULayer"]
9class SoLU(nn.Module): 10 """Softmax Linear Unit (SoLU) activation function. 11 12 The SoLU activation function applies a softmax operation along a specified 13 dimension and multiplies it element-wise with the input tensor: 14 15 f(x) = x * softmax(x, dim=dim) 16 17 This activation function creates a multiplicative interaction between the 18 input and its normalized version, which can help with gradient flow and 19 feature learning in deep neural networks. 20 21 Attributes: 22 dim: The dimension along which to apply softmax. 23 24 Example: 25 >>> import torch 26 >>> from SoLU import SoLU 27 >>> solu = SoLU(dim=-1) 28 >>> x = torch.randn(2, 5, 4) 29 >>> output = solu(x) 30 >>> assert output.shape == x.shape 31 """ 32 33 def __init__(self, dim: int = -1): 34 """Initialize the SoLU activation function. 35 36 Args: 37 dim: The dimension along which to apply softmax. Defaults to -1 38 (the last dimension), which is typically the feature dimension 39 in transformer architectures. 40 """ 41 super().__init__() 42 self.dim = dim 43 44 def forward(self, x: torch.Tensor) -> torch.Tensor: 45 """Apply the SoLU activation to the input tensor. 46 47 Args: 48 x: Input tensor of any shape. The softmax operation will be 49 applied along the dimension specified in ``self.dim``. 50 51 Returns: 52 A tensor with the same shape as ``x``, where each element is 53 the product of the corresponding input element and its softmax 54 normalization along the specified dimension. 55 """ 56 return x * F.softmax(x, dim=self.dim)
Softmax Linear Unit (SoLU) activation function.
The SoLU activation function applies a softmax operation along a specified dimension and multiplies it element-wise with the input tensor:
f(x) = x * softmax(x, dim=dim)
This activation function creates a multiplicative interaction between the input and its normalized version, which can help with gradient flow and feature learning in deep neural networks.
Attributes:
- dim: The dimension along which to apply softmax.
Example:
>>> import torch >>> from SoLU import SoLU >>> solu = SoLU(dim=-1) >>> x = torch.randn(2, 5, 4) >>> output = solu(x) >>> assert output.shape == x.shape
33 def __init__(self, dim: int = -1): 34 """Initialize the SoLU activation function. 35 36 Args: 37 dim: The dimension along which to apply softmax. Defaults to -1 38 (the last dimension), which is typically the feature dimension 39 in transformer architectures. 40 """ 41 super().__init__() 42 self.dim = dim
Initialize the SoLU activation function.
Arguments:
- dim: The dimension along which to apply softmax. Defaults to -1 (the last dimension), which is typically the feature dimension in transformer architectures.
44 def forward(self, x: torch.Tensor) -> torch.Tensor: 45 """Apply the SoLU activation to the input tensor. 46 47 Args: 48 x: Input tensor of any shape. The softmax operation will be 49 applied along the dimension specified in ``self.dim``. 50 51 Returns: 52 A tensor with the same shape as ``x``, where each element is 53 the product of the corresponding input element and its softmax 54 normalization along the specified dimension. 55 """ 56 return x * F.softmax(x, dim=self.dim)
Apply the SoLU activation to the input tensor.
Arguments:
- x: Input tensor of any shape. The softmax operation will be
applied along the dimension specified in
self.dim.
Returns:
A tensor with the same shape as
x, where each element is the product of the corresponding input element and its softmax normalization along the specified dimension.
9class SoLULayer(nn.Module): 10 """A neural network layer combining SoLU activation with LayerNorm. 11 12 This layer implements the effective block used in recent research to 13 recover and improve performance in transformer-like architectures: 14 15 f(x) = LayerNorm(SoLU(x)) 16 17 The combination of SoLU activation followed by LayerNormalization 18 provides stable training dynamics and has been shown to improve 19 convergence in deep networks. 20 21 Attributes: 22 solu: The SoLU activation function module. 23 layer_norm: A LayerNorm module that normalizes across the hidden size. 24 25 Example: 26 >>> import torch 27 >>> from SoLU import SoLULayer 28 >>> layer = SoLULayer(hidden_size=4) 29 >>> x = torch.randn(2, 5, 4) # batch_size=2, seq_len=5, hidden_dim=4 30 >>> output = layer(x) 31 >>> assert output.shape == x.shape 32 """ 33 34 def __init__(self, hidden_size: int, dim: int = -1): 35 """Initialize the SoLULayer. 36 37 Args: 38 hidden_size: The size of the hidden dimension, which determines 39 the shape normalization for LayerNorm. 40 dim: The dimension along which to apply softmax in the SoLU 41 activation. Defaults to -1 (the last dimension). 42 """ 43 super().__init__() 44 self.solu = SoLU(dim=dim) 45 self.layer_norm = nn.LayerNorm(hidden_size) 46 47 def forward(self, x: torch.Tensor) -> torch.Tensor: 48 """Apply the SoLU activation and LayerNorm to the input tensor. 49 50 Args: 51 x: Input tensor of shape ``(*, hidden_size)`` where ``hidden_size`` 52 matches the ``hidden_size`` passed to the constructor. 53 54 Returns: 55 A tensor with the same shape as ``x``, after applying SoLU 56 activation followed by LayerNormalization. 57 """ 58 x = self.solu(x) 59 return self.layer_norm(x)
A neural network layer combining SoLU activation with LayerNorm.
This layer implements the effective block used in recent research to recover and improve performance in transformer-like architectures:
f(x) = LayerNorm(SoLU(x))
The combination of SoLU activation followed by LayerNormalization provides stable training dynamics and has been shown to improve convergence in deep networks.
Attributes:
- solu: The SoLU activation function module.
- layer_norm: A LayerNorm module that normalizes across the hidden size.
Example:
>>> import torch >>> from SoLU import SoLULayer >>> layer = SoLULayer(hidden_size=4) >>> x = torch.randn(2, 5, 4) # batch_size=2, seq_len=5, hidden_dim=4 >>> output = layer(x) >>> assert output.shape == x.shape
34 def __init__(self, hidden_size: int, dim: int = -1): 35 """Initialize the SoLULayer. 36 37 Args: 38 hidden_size: The size of the hidden dimension, which determines 39 the shape normalization for LayerNorm. 40 dim: The dimension along which to apply softmax in the SoLU 41 activation. Defaults to -1 (the last dimension). 42 """ 43 super().__init__() 44 self.solu = SoLU(dim=dim) 45 self.layer_norm = nn.LayerNorm(hidden_size)
Initialize the SoLULayer.
Arguments:
- hidden_size: The size of the hidden dimension, which determines the shape normalization for LayerNorm.
- dim: The dimension along which to apply softmax in the SoLU activation. Defaults to -1 (the last dimension).
47 def forward(self, x: torch.Tensor) -> torch.Tensor: 48 """Apply the SoLU activation and LayerNorm to the input tensor. 49 50 Args: 51 x: Input tensor of shape ``(*, hidden_size)`` where ``hidden_size`` 52 matches the ``hidden_size`` passed to the constructor. 53 54 Returns: 55 A tensor with the same shape as ``x``, after applying SoLU 56 activation followed by LayerNormalization. 57 """ 58 x = self.solu(x) 59 return self.layer_norm(x)
Apply the SoLU activation and LayerNorm to the input tensor.
Arguments:
- x: Input tensor of shape
(*, hidden_size)wherehidden_sizematches thehidden_sizepassed to the constructor.
Returns:
A tensor with the same shape as
x, after applying SoLU activation followed by LayerNormalization.