Batch normalization – Knowledge and References

Explore chapters and articles related to this topic

Deep Learning

Published in Seyedeh Leili Mirtaheri, Reza Shahbazian, Machine Learning Theory to Applications, 2022

Seyedeh Leili Mirtaheri, Reza Shahbazian

Training deep neural networks with tens of convolution and fully connected layers is challenging. As mentioned in the neural network chapter, the training process can be sensitive to the initial random weights and configuration of the learning algorithm. One possible reason for difficulty in training phase is the distribution of the inputs to layers deep in the network may change after each mini batch when the weights are updated. This can cause the learning algorithm to forever chase a moving target. This change in the distribution of inputs to layers in the network is referred to by the technical name internal covariate shift. Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs that are required to train deep networks. Batch normalization accelerates training, in some cases by halving the epochs or better, and provides some regularization, reducing generalization error.

Deep Semantic Segmentation in Autonomous Driving

View Chapter

Purchase Book

Published in Mahmoud Hassaballah, Ali Ismail Awad, Deep Learning in Computer Vision, 2020

Hazem Rashed, Senthil Yogamani, Ahmad El-Sallab, Mohamed Elhelw, Mahmoud Hassaballah

Batch normalization layer: This layer is used in deep learning to speed up the training by making normalization part of the network. For example, if input values are in a range from 1 to 1,000 and from 1 to 10, the change from 1 to 2 in the second input represents a 10% change in the input value, but actually represents only a 0.1% change in the first input. Normalizing the inputs will help the network learn faster and more accurately without biasing certain dimensions over the others. Batch normalization uses a very similar approach, however, for the hidden layers changing the outputs all the time. Since the activations of a previous layer are the inputs of the next layer, each layer in the neural network faces a problem due to input distribution change with each step. This problem is termed “covariate shift”. The basic idea behind batch normalization is to limit covariate shift by normalizing the activations of each layer through transforming the inputs to be mean 0 and unit variance. This allows each layer to learn on a more stable distribution of inputs, and would accelerate the training of the network. Limiting the covariate shift helps as well in avoiding the vanishing gradient problem, where gradients become very small when the input is large in a sigmoid activation function, for example, as the distribution keeps changing during training and might be large enough to create the vanishing gradient problem [32].

Deep Learning in Brain Segmentation

View Chapter

Purchase Book

Published in Saravanan Krishnan, Ramesh Kesavan, B. Surendiran, G. S. Mahalakshmi, Handbook of Artificial Intelligence in Biomedical Engineering, 2021

Hao-Yu Yang

Batch normalization is a technique for improving the training of neural networks in terms of speed and guarantee convergence. Essentially, batch normalization adjusts each mini-batch by subtracting the batch mean from each sample then dividing by the batch standard deviation. There are two major advantages of using batch normalization. First, by normalizing the activations at each layer, the interlayer interaction can be reduced, meaning that an over-activation in one layer will less likely to affect the up-coming layer. Second, without batch normalization, choosing a high learning rate will likely to cause fluxing training. In short, batch normalization helps stabilize the training process.

Ship target detection of unmanned surface vehicle base on EfficientDet

View Article

Journal Information

Published in Systems Science & Control Engineering, 2022

Ronghui Li, Jinshan Wu, Liang Cao

Batch Normalization (BN, Ioffe & Szegedy, 2015) implements a pre-processing operation in the middle of the neural network layers, i.e. the input of the previous layer is normalized before entering the next layer of the network, which can prevent effectively ‘gradient dispersion’ and accelerate the network training. There are two common approaches. One is to take the neurons in a feature map as a feature dimension with parameter r, which could make the number of parameters larger, so this approach is not used in this paper. The other approach of ‘shared parameter’ was adopted here, which means that a whole feature map is considered as a feature dimension, and this feature dimension and the function mapping of neurons share a parameter r. Consequently, we can smoothly calculate and obtain the required mean and variance process in the BN layer during the training.

Displacement and strain data-driven damage detection in multi-component and heterogeneous composite structures

View Article

Journal Information

Published in Mechanics of Advanced Materials and Structures, 2022

A. Pagani, M. Enea

Convolutional neural networks are fed with 3D input. The network architecture which can receive and process these input can be built by employing three types of layers. Convolutional layers have 2D learnable filters as parameters, which are made slide over the input layer. This layer computes the dot product between these filters and the current small region of the input, creating a feature map. These maps will be used as input for the following layers. Then, the pooling layer performs a downsampling operation using maximum or average operations. Finally, the fully connected layer will compute the weighted sum of input as in classical feedforward neural network. It should also be underlined that regularization procedures are generally adopted in CNN training in order to prevent a bad generalization of the model to the non-training data. In the present work, dropout regularization and batch normalization layers are employed. An example of CNN architecture is shown in Figure 4.

Wer Debris Recognition and Quantification in Ferrography Images by Instance Segmentation

View Article

Journal Information

Published in Tribology Transactions, 2022

Kang Sun, Xinliang Liu, Guoning Chen, Jingqiu Wang

In the mask branch, convolution, batch normalization (29), and activation operations are often performed in combination; such a structure is referred to as ConvUnit in this article. Batch normalization is often used to improve the training speed and simplify the parameter adjustment process in a CNN. The commonly used activation function is the rectified linear unit (30). The size of the input feature map generated from the RoIAlign layer is 20 × 20. After four ConvUnit operations, the size of the feature map is not changed. Then, a deconvolution is operated to double the size of the feature map. Finally, a 1 × 1 convolution is performed to change the channel number so that the channels of the output feature map can match the categories of wear debris. The output category contains five types of wear debris and the background of the image, so the total number of categories is six.