Ridge regression – Knowledge and References

Explore chapters and articles related to this topic

Generalized Regression Penalty against Complexity

Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022

L2 regularization is used in ridge regression to counteract against collinearity When multicollinearity occurs, the variances are large and thus far from the true value. Ridge regression is an effective counter-measure because it allows for better interpretation of the regression coefficients by imposing some bias on the regression coefficients and shrinking their variances (Morris 1982; Pagel and Lunneberg 1985). In L2 regularization, coefficients are shrunken or biased towards zero, but not exactly zero. By doing so, coefficients estimated by ridge regression are less variable than those estimated by OLS. Ridge regression is useful when the goal is to retain more predictors in the model. Adaptive ridge is another variant of ridge regression. It has been found that adaptive ridge is equivalent to its LASSO counterpart and in most cases they produce the identical results (Grandvalet and Canu 1998), and therefore no further discussion on adaptive ridge is needed.

Multivariable Linear Regression

View Chapter

Purchase Book

Published in Harry G. Perros, An Introduction to IoT Analytics, 2021

Harry G. Perros

In ridge regression, the above cost function is modified by adding a penalty which is the squared sum of the coefficients, i.e.,Cridge=∑i=1nyi−a0−∑j=1kajxij2+λ∑i=1nαi2.Setting λ to zero is the same as minimizing the SSE cost function, while the larger the value of λ the more the effect of the penalty. Parameter λ is known as the regularization parameter and the penalty is known as the regularization penalty. We note that regularization penalties are used in many techniques, such as, decision trees, Chapter 9, neural networks, Chapter 10, and support vector machines, Chapter 11.

Statistics

View Chapter

Purchase Book

Published in Rakesh M. Verma, David J. Marchette, Cybersecurity Analytics, 2019

Rakesh M. Verma, David J. Marchette

A related method that uses a slightly different penalty is the Lasso. The Lasso penalizes using the absolute values:18∥y−Xβ∥22+λ∥β∥1. By increasing λ one places a greater and greater penalty on the absolute values of the terms in β, which has the effect of driving more and more terms to 0. Thus, the Lasso can be used as feature selection method, as well as a way of producing sparse models. See [428] and [429]. In a sense, Lasso implements a more extreme version of model complexity reduction than ridge regression.

Empirical analysis of the impact of collaborative care in internal medicine: Applications to length of stay, readmissions, and discharge planning

View Article

Journal Information

Published in IISE Transactions on Healthcare Systems Engineering, 2023

Paul M. Cronin, Douglas J. Morrice, Jonathan F. Bard, Luci K. Leykum

The backward selection regression model estimated via BIC minimization criteria served as a baseline model to compare with our favored approach for this problem, the elastic net model. The elastic net serves as a hybrid combination of the lasso and ridge regression frameworks. Least absolute shrinkage and selection operation (lasso) and ridge estimation are two shrinkage techniques that can guard against overfitting and reduce model complexity (Hoerl & Kennard, 1970, Tibshirani, 1996). More precisely, lasso and ridge regression are forms of penalized regression which apply the L1 and L2 norms, respectively, to the objective function. In the case of ridge, the coefficients are shrunk toward zero, but the model ultimately includes all the coefficients. One of the key features is that a minor increase in bias is rewarded with hopefully reduced error variance and more reliable coefficient t-tests (James et al., 2021). Lasso differs from ridge in several ways, including simultaneously performing both estimation and variable selection by shrinking less important variable coefficients to zero (not just toward).

Innovative AI-based multi-objective mixture design optimisation of CPB considering properties of tailings and cement

View Article

Journal Information

Published in International Journal of Mining, Reclamation and Environment, 2023

Ehsan Sadrossadat, Hakan Basarir, Ali Karrech, Mohamed Elchalakani

Multicollinearity is a problem that occurs when there is a considerable linear correlation between independent variables or predictors. In this situation, the coefficients of linear regression (using OLS) can be mistaken and accordingly misinterpreted which leads to overfitting. In order to deal with multicollinearity, one or more number of highly correlated variables can be removed, a combination of variables can be used instead of one variable or other dimensionality reduction approaches may be used such as Principal Component Analysis (PCA). Multicollinearity is handled in ML algorithms as the relationship between variables might be nonlinear or unknown while least possible prediction error is of priority. It is worth mentioning that ridge regression is commonly used in ML algorithms rather than OLS method. Ridge regression provides the minimum variance, by adding the slight bias to the estimation. In ML, this is called regularisation. Regularisation or shrinkage algorithms are used to estimate reliable predictor coefficients when the predictors are highly correlated. By imposing different penalties, ridge regression keeps all predictors in the final model, while LASSO ensures sparsity of the results by shrinking some coefficients exactly to zero.

Investigation of Data Size Variability in Wind Speed Prediction Using AI Algorithms

View Article

Journal Information

Published in Cybernetics and Systems, 2020

M. A. Ehsan, Amir Shahirinia, Nian Zhang, Timothy Oladunni

Ridge regression is a variant of linear regression that uses L2 regularization (Exterkate et al. 2016). The fundamental equation for ridge regression is the same as linear regression with a constraint on it, as described in (7), where C defines the boundaries of ridge regression. The regularization shrinks the parameters and reduces the model complexity by a coefficient of shrinkage. This coefficient is known as the penalty and denoted by (hyperparameter). Now we look at the equation for ridge regression and it is clear- first part of (8) is the same as linear regression, and the true difference between linear and ridge is the second term containing the constraint, B, shown in (9) and penalty. Due to the inclusion of the penalty, the residual error is minimized, therefore, ridge regression shall produce better accuracy.