Principal component regression – Knowledge and References

Explore chapters and articles related to this topic

Multiple Linear Regression

Published in Jhareswar Maiti, Multivariate Statistical Modeling in Engineering and Management, 2023

If we want to develop a MLR model, multicollinearity should be removed. But the researcher or analyst should not develop a statistical model by scarifying relevant information. For example, if the predictors variables are inherently related, it is advisable to keep the correlated structure inherent during model-building and accordingly appropriate statistical model should be used. For example, the path model (Chapter 10) may be used in such cases. Nevertheless, under many situations, a MLR is preferable and multicollinearity, if exists, should be removed. The most frequently used techniques are ridge regression and principal component regression. As multicollinearity increases the variance of the estimated parameters, in ridge regression a weight is added to the parameter variance while estimating to adjust the inflated variance value of the estimates. For discussion on ridge regression, see Vinod (1978). The principal component regression transforms the predictor variables into orthogonal (independent) dimensions and the regression is carried on between Y and the orthogonal principal components (PCs). Principal component analysis (PCA) is discussed in Chapter 11. The difficulty with principal component regression is in interpreting the regression coefficients. However, if one is interested in predicting the future y values, principal component regression is very useful.

Machine learning methods for computational social science

View Chapter

Purchase Book

Published in Uwe Engel, Anabel Quan-Haase, Sunny Xun Liu, Lars Lyberg, Handbook of Computational Social Science, Volume 2, 2021

Richard D. De Veaux, Adam Eck

One of the most popular feature creation methods is principal components. The principal components zj are defined so that the first principal component finds the linear combination of the predictors (suitably normalized) with the highest variance. (If you think of the data as forming an ellipse in p dimensions, this direction is the major axis). The subsequent components are the directions of highest variance that are orthogonal to all previous components. This is repeated until all p components are found. The hope of principal component analysis is that most of the total variance in the predictors and thus the information they contain can be approximated by using only the first few components. It’s the same goal used in algorithms for file compression. Although the response variable plays no part in the selection of these components, the hope of principal component regression is that the response variable is well approximated by a linear regression on some subset of these new variables. There are many variants of principal components with slightly different criteria, or constraints, including factor analysis, multidimensional scaling, and partial least squares.

Modern Predictive Analytics and Big Data Systems Engineering

View Chapter

Purchase Book

Published in Anna M. Doro-on, Handbook of Systems Engineering and Risk Management in Control Systems, Communication, Space Technology, Missile, Security and Defense Operations, 2023

Anna M. Doro-on

Principal component regression (PCR) is a regression process similar to ridge regression, which is based on PCA. The fundamental concept is to determine the directions in x-space that consist of the most variations; subsequently, we perform a regression model in the lower-dimensional space and apply the least-squares method. By using PCR, a high-dimensional space is reduced into two or three dimensions, and the high-dimensional space can be easily interpreted and imagined.

Prediction of octane numbers for commercial gasoline using distillation curves: a comparative regression analysis between principal component and partial least squares methods

View Article

Journal Information

Published in Petroleum Science and Technology, 2022

Hayder Mohammed Issa

Equation (3) allows quantifying the loading weight of these variables (Gemperline 2006). The correlation between X and y values was made using the XLSTAT software (version 2014 for Windows). Initially, the collected commercial gasoline samples were separated into main two groups, calibration, and prediction, with 229 and 115 samples, respectively, according to the Kennard-Stone algorithm (Kennard and Stone 1969). The PLS model established from the calibration set has been assessed as per leave-one-out for internal cross-validation. For principal component regression (PCR) technique, the explanatory variables X are decomposed using principal component analysis (PCA) into loadings and scores, while PCR has nothing to do with independent variable y. To correlate y and X in a linear relationship, the multiple linear regression model is developed by selecting loadings and scores. The model is created by combining linear factors at the X variables, resulting in a linear relationship between the X variables and the dependent y variable.

Multimodal data fusion for systems improvement: A review

View Article

Journal Information

Published in IISE Transactions, 2022

Nathan Gaw, Safoora Yousefi, Mostafa Reisi Gahrooei

Early fusion (or low-level fusion) is the process of fusing modalities by only using information from the predictors (i.e., independent variables). Early fusion can either occur as a preprocessing task before incorporation into the main model, or as a purely unsupervised task to generate features that best describe the underlying patterns across modalities. In feature preprocessing, the main goal is to combine raw features from different modalities to generate new features that combine complementary information of the raw features from different modalities. These new features are then inputted to a supervised model for a training task. Early fusion as a purely unsupervised task has the goal of combining features across modalities to discern underlying patterns present across different modalities or generate visualization that aptly describes information from the different modalities (i.e., combining different types of medical imaging to generate another image that displays complementary information) (He et al., 2010; Moin et al., 2016; Rajalingam and Priya, 2017). Principal component regression is an example of early fusion, in which Principal Component Analysis (PCA) is performed to extract input features that are then employed to predict an output value.

Multi-mechanical properties comprehensive evaluation by single excitation mode using controlled laser air-force detection (CLAFD) technique

View Article

Journal Information

Published in Soft Materials, 2021

Hubo Xu, Yingzi Lin, Beibei Zhang, Juan Hincapie, Xiuying Tang

As a good alternative to the classical multiple linear regression, the principal component regression method is more robust. The global variables partial least-squares regression (Gv-PLASR) was used in the study to build the quantitative prediction for the multi-mechanical properties of polyurethane. Gv-PLSR is a multivariate calibration with factors analysis based on principal component analysis (PCA). Only the independent variable matrix is decomposed in the PCA, and the redundant information is eliminated. However, both the dependent variable matrix and independent variable matrix are analyzed in the application of Gv-PLSR. In this study, the data of multi-mechanical property were the dependent variable and the data of laser response were the independent variable, the former was introduced into the progress of the decomposition of the latter. Therefore, the principal component of laser signal was associated with the multi-property of polyurethane. The basic progress of modeling is as follows: