Regularization – Knowledge and References

Explore chapters and articles related to this topic

Evolution of Long Short-Term Memory (LSTM) in Air Pollution Forecasting

Published in Monika Mangla, Subhash K. Shinde, Vaishali Mehta, Nonita Sharma, Sachi Nandan Mohanty, Handbook of Research on Machine Learning, 2022

Satheesh Abimannan, Deepak Kochhar, Yue-Shan Chang, K. Thirunavukkarasu

As discussed, LSTMs are state-of-the-art models for time series forecasting. Air pollution forecasting comes under the same domain and thus LSTMs and its variants have pioneered this work as evident from various implementations that are discussed in detail in the upcoming sections. The data is first divided into the independent and target feature set and then split into training, testing, and validation sets. The clean and engineered numeric air pollution data in time series format is fed to the LSTM model which is trained with various hyperparameters with constant accuracy check. The hyperparameters are tuned with every cycle to further enhance accuracy. Various methods such as dropout regularization (developed by Google) are used for reducing overfitting in the forecasting model. The dropout method drops units out of the neural network to prevent complex co-adaptations on the training data. Apart from the dropout regularization, other regularization techniques such as Lasso (L1) and Ridge (L2) regularization are also used, these are decided based on the use case. The best performing model is saved and used for predicting final values. The predicted values are mapped and tested against the original values to benchmark model performance based on parameters such as accuracy, precision, recall, f1-score, root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), etc.

Deep Learning Models for Visual Computing

View Chapter

Purchase Book

Published in Hassan Ugail, Deep Learning in Visual Computing, 2022

Hassan Ugail

Various activation functions have been used in CNNs. Today, the rectified linear unit (ReLU) is the most popular activation function, which is simply a ramp function f(z) = max(z,0). During the backward pass, weights are tuned to minimise the error. This is achieved by a technique known as backpropagation. The procedure computes the partial derivative of the error concerning the weights, achieved by working backward. This computation then indicates by what amount the error decreases or reduces as a result of a small change in the weights. Subsequently, the weights are adjusted in the opposite direction of the computed gradient. After adjustment of the weights, the output error changes, thus before the next iteration, the partial derivatives have to be recomputed once again. Due to the large number of parameters involved in a neural network, the algorithm often overfits—a phenomenon in which the model performs excellently on training dataset—classification with minimum error. However, it fails to generalise on a new unseen dataset. Methods for avoiding overfitting include the use of a large training dataset, stopping the training as soon as performance on a validation set starts to get worse, regularisation, and dropout.

Applications of Machine Learning in Economic Data Analysis and Policy Management

View Chapter

Purchase Book

Published in P. Kaliraj, T. Devi, Artificial Intelligence Theory, Models, and Applications, 2021

Brijesh Kumar Gupta

L1 Regularization: It adds the multiplier of sum of absolute weights as the additional penalty term to the existing loss functions. The resulting cost function looks as following CostFunction=Loss+λ∑j=1n|βj| where λ is the regularization rate and βj is the feature weights

Leak localization in water distribution networks using GIS-Enhanced autoencoders

View Article

Journal Information

Published in Urban Water Journal, 2023

Michael Weyns, Ganjour Mazaev, Guido Vaes, Filip Vancoillie, Filip De Turck, Sofie Van Hoecke, Femke Ongenae

Once the data has been prepared, we define a number of hyperparameters for each of the potential autoencoder architectures defined in Tables 4 and 5. As stated in Subsection 3.1, these autoencoder architectures will all be evaluated. Both the RNN and the T-GCN are characterized by the tunable parameters l2 regularization and time window. L2 regularization is commonly used to prevent overfitting of the models by adding a regularization term (squared magnitude of the coefficients) to the loss function. This term is used to penalize overly large coefficients. The time window is used by the temporal layers in each autoencoder setup, and simply indicates how many time steps the model will be considering for a given prediction. Finally, for each setup we also have to determine the number of encoder layers (which implies the same number of decoder layers), as well as the output dimensions of the outer RNN layers (from which all other output dimensions are derived). The dimension of the outer GCN layer is set to the squared temporal dimension. The hyperparameter ranges that were investigated can be found in Table 6.

On the Generalizability of Machine-Learning-Assisted Anisotropy Mappings for Predictive Turbulence Modelling

View Article

Journal Information

Published in International Journal of Computational Fluid Dynamics, 2022

Ryley McConkey, Eugene Yee, Fue-Sang Lien

It is noted that the formulation in Equation (11) can also be extended to an arbitrary tensor basis. In words, the formulation in Equation (11) seeks to obtain the solution to the following optimisation problem: namely, find the solution that represents the best compromise between minimising the errors in the approximation of the anisotropy tensor while constraining the magnitude of the coefficients to be as small as possible (viz., shrinking the values of these coefficients toward zero). In particular, the objective function includes an L2 regularisation (penalty) term that constrains the norm squared of the optimised coefficients . We found this regularisation term was crucial for stabilising and reducing the magnitude of these coefficients, thereby increasing their useability as machine learning prediction targets.

Identification of criteria for the selection of buildings with elevated energy saving potentials from hydraulic balancing-methodology and case study

View Article

Journal Information

Published in Advances in Building Energy Research, 2022

Haein Cho, Daniel Cabrera, Martin K. Patel

With a goal to select a subset of variables that explain most of the variability in indoor temperature across the flats, we apply a regularization method. The regularization method allows us to develop low-dimensional models for systems that are characterized by a large number of variables. There are two types of regularization that are commonly used: Ridge and Least Absolute Shrinkage and Selection Operator (LASSO) regression. Ridge penalizes the regression model by imposing a regularization term called ‘L2-penalty’, which is a sum of the squared coefficients, while LASSO applies a penalty term called ‘L1-penalty’, which is proportional to the sum of the absolute coefficients, in the regression model. Unlike Ridge, LASSO regression is capable of zeroing the coefficients with little explanatory power using the LI penalty. Thus, LASSO is more suitable for our study that aims to select a subset of variables with major contributions.