BatchNormalization
==================

.. py:class:: fleras.fused_renorm.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, renorm_clipping=None, trainable=True, name=None, **kwargs)

   Bases: :py:obj:`tensorflow.keras.layers.Layer`


   Layer that normalizes its inputs.

   Batch normalization applies a transformation that maintains the mean output
   close to 0 and the output standard deviation close to 1.

   Importantly, batch normalization works differently during training and
   during inference.

   **During training** (i.e. when using `fit()` or when calling the layer/model
   with the argument `training=True`), the layer normalizes its output using
   the mean and standard deviation of the current batch of inputs. That is to
   say, for each channel being normalized, the layer returns
   `gamma * (batch - mean(batch)) / sqrt(var(batch) + epsilon) + beta`, where:

   - `epsilon` is small constant (configurable as part of the constructor
   arguments)
   - `gamma` is a learned scaling factor (initialized as 1), which
   can be disabled by passing `scale=False` to the constructor.
   - `beta` is a learned offset factor (initialized as 0), which
   can be disabled by passing `center=False` to the constructor.

   **During inference** (i.e. when using `evaluate()` or `predict()`) or when
   calling the layer/model with the argument `training=False` (which is the
   default), the layer normalizes its output using a moving average of the
   mean and standard deviation of the batches it has seen during training. That
   is to say, it returns
   `gamma * (batch - self.moving_mean) / sqrt(self.moving_var+epsilon) + beta`.

   `self.moving_mean` and `self.moving_var` are non-trainable variables that
   are updated each time the layer in called in training mode, as such:

   - `moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum)`
   - `moving_var = moving_var * momentum + var(batch) * (1 - momentum)`

   As such, the layer will only normalize its inputs during inference
   *after having been trained on data that has similar statistics as the
   inference data*.

   :param axis: Integer or a list of integers, the axis that should be normalized
                (typically the features axis). For instance, after a `Conv2D` layer with
                `data_format="channels_first"`, set `axis=1` in `BatchNormalization`.
   :param momentum: Momentum for the moving average.
   :param epsilon: Small float added to variance to avoid dividing by zero.
   :param center: If True, add offset of `beta` to normalized tensor. If False,
                  `beta` is ignored.
   :param scale: If True, multiply by `gamma`. If False, `gamma` is not used. When
                 the next layer is linear (also e.g. `nn.relu`), this can be disabled
                 since the scaling will be done by the next layer.
   :param beta_initializer: Initializer for the beta weight.
   :param gamma_initializer: Initializer for the gamma weight.
   :param moving_mean_initializer: Initializer for the moving mean.
   :param moving_variance_initializer: Initializer for the moving variance.
   :param beta_regularizer: Optional regularizer for the beta weight.
   :param gamma_regularizer: Optional regularizer for the gamma weight.
   :param beta_constraint: Optional constraint for the beta weight.
   :param gamma_constraint: Optional constraint for the gamma weight.
   :param renorm_clipping: A dictionary that may map keys 'rmax', 'rmin', 'dmax' to
                           scalar `Tensors` used to clip the renorm correction. The correction `(r,
                           d)` is used as `corrected_value = normalized_value * r + d`, with `r`
                           clipped to [rmin, rmax], and `d` to [-dmax, dmax]. Missing rmax, rmin,
                           dmax are set to inf, 0, inf, respectively.
   :param renorm_momentum: Momentum used to update the moving means and standard
                           deviations with renorm. Unlike `momentum`, this affects training and
                           should be neither too small (which would add noise) nor too large (which
                           would give stale estimates). Note that `momentum` is still applied to
                           get the means and variances for inference.
   :param trainable: Boolean, if `True` the variables will be marked as trainable.

   Call arguments:
     inputs: Input tensor (of any rank).
     training: Python boolean indicating whether the layer should behave in
       training mode or in inference mode.
       - `training=True`: The layer will normalize its inputs using the mean
         and variance of the current batch of inputs.
       - `training=False`: The layer will normalize its inputs using the mean
         and variance of its moving statistics, learned during training.
     mask: Binary tensor of shape broadcastable to `inputs` tensor, indicating
       the positions for which the mean and variance should be computed.

   Input shape: Arbitrary. Use the keyword argument `input_shape` (tuple of
     integers, does not include the samples axis) when using this layer as the
     first layer in a model.

   Output shape: Same shape as input.

   Reference:
     - [Ioffe and Szegedy, 2015](https://arxiv.org/abs/1502.03167).



.. footbibliography::