Explore chapters and articles related to this topic
Forecasting in the air transport industry
Published in Bijan Vasigh, Ken Fleming, Thomas Tacker, Introduction to Air Transport Economics, 2018
Bijan Vasigh, Ken Fleming, Thomas Tacker
Another potential major problem that Table 10.24 highlights is multicollinearity. Multicollinearity occurs when two or more independent variables are highly correlated with each other. If two independent variables are perfectly correlated, then the estimates of the coefficients cannot be computed. Intuitively, the problem arises since the regression cannot separate the effects of the perfectly correlated independent variables. Quantitatively, it arises because there is a term in the denominator for the variance of the individual independent variables that contains the correlation factor between the independent variables. As this term approaches 1 (perfect correlation), the variances of both of the independent variables approach infinity. And, as we have seen above, the t-statistic is calculated by dividing the numerical value of the coefficient by its standard deviation. Since the standard deviation is simply the square root of the variance, then the larger the variance, the larger the standard deviation and the smaller the t value. Thus, a high degree of multicollinearity between independent variables can cause a low level of significance for either one or both of the independent variables.
Aviation Forecasting and Regression Analysis
Published in Bijan Vasigh, Ken Fleming, Thomas Tacker, Introduction to Air Transport Economics, 2018
Bijan Vasigh, Ken Fleming, Thomas Tacker
If multicollinearity is found to be a problem in a particular regression, then conventional methods for dealing with the problem are acquiring more data or eliminating one or more of the highly collinear independent variables. Since it is rarely possible to acquire more data for a given regression (due to time constraints and so forth), attention shifts to the elimination of variables. If all the variables are still significant at conventional levels of significance, then it is generally advisable to retain the original model, since it was our best initial theoretical formulation of the relationship. If, on the other hand, one or more of the collinear variables are not significant at conventional levels, then consideration should be given to dropping the non-significant variable and rerunning the regression. In this case the researcher is implicitly assuming that the two highly collinear variables are providing the same information with respect to the dependent variable.
Data Reuse: A Powerful Data Mining Effect of the GenIQ Model
Published in Bruce Ratner, Statistical and Machine-Learning Data Mining, 2017
Data reuse is appending new variables, found when building a GenIQ Model, to the original dataset. As the new variables are reexpressions of the original variables, the correlations among the original variables and the GenIQ data-mined variables are expectedly high. In the context of statistical modeling, the occurrence of highly correlated predictor variables is a condition known as multicollinearity. The effects of multicollinearity are inflated standard errors of the regression coefficients, unstable regression coefficients, lack of valid declaration of the importance of the predictor variables, and an indeterminate regression equation when the multicollinearity is severe. The simplest solution of guess-n-check, although inefficient, to the multicollinearity problem is deleting suspect variables from the regression model.
Evaluation of machine learning approach for base and subgrade layer temperature prediction at various depths in the presence of insulation layers
Published in International Journal of Pavement Engineering, 2023
Yunyan Huang, Mohamad Molavi Nojumi, Shadi Ansari, Leila Hashemian, Alireza Bayat
Three primary input variables affect pavement temperature: day of the year, air temperature, and depth. When predicting pavement temperature, it is important to consider the correlation between these variables to ensure that multicollinearity does not exist. Multicollinearity is a statistical phenomenon that occurs when two or more predictor variables are highly correlated with each other. This can lead to problems when estimating regression models because the coefficients of the correlated predictors will be unstable and difficult to interpret. Depth is constant, so it could not have a high correlation with other inputs. The correlation between air temperature and day of the year is 0.18, indicating a low correlation. Thus, no high correlation was found between these three inputs.
Construct validity of gymnastics-specific assessment on the neuromuscular function of shoulder flexor and extensor muscles
Published in Sports Biomechanics, 2023
Dimitrios C. Milosis, Theophanis A. Siatras, Kosmas I. Christoulas, Dimitrios A. Patikas
The Shapiro-Wilks test confirmed the normal distribution of all variables. The Box’s test provided support for the equality of variance-covariance matrices across groups. The results of the Bartlett’s test of sphericity supported that the variances were equal across groups. Multicollinearity emerged for some variables that assessing the same parameter of neuromuscular function (PT, NME, or CI) and the same shoulder isokinetic action (flexion or extension). Specifically, very high correlations (r > 0.90) emerged between (a) PT_FLcon_60º/s and PT_FLcon_180º/s, (b) PT_EXecc_180º/s and PT_EXecc_300º/s, and (c) the variables assessing the NME of the flexor muscles during eccentric contraction. Multicollinearity affects the coefficients and p-values, but it does not influence the predictions, the precision of the predictions, and the goodness-of fit statistics. Multicollinearity affects only the specific independent variables that are correlated (Tabachnick & Fidell, 2012). Because the purpose of the study was to evaluate how the parameters PT, NME, and CI can discriminate between participants with a different gymnastics performance level, it was decided to include all variables in the analyses. Six discriminant analyses were conducted for the parameters PT, NME, and CI separate for concentric and eccentric contraction. In the cases that collinearity was detected, it was considered in the discussion of the discriminant analysis results.
The effect of live streaming commerce quality on customers’ purchase intention: extending the elaboration likelihood model with herd behaviour
Published in Behaviour & Information Technology, 2023
In addition to CMB, we checked the normality of the distributions of all the major research variables. Hair et al. (2010) and Byrne (2013) argued that data is considered to be normal if skewness is between −2 to +2 and kurtosis is between −7 to +7. The values of skewness in this study range from −1.436 to −1.026, and the values of kurtosis range from 1.206 to 3.12, indicating an acceptable level. Therefore, we continue to adopt the variables in the following analyses. A multicollinearity test is the next step to determine whether there are similarities between the variables in the proposed model. Multicollinearity refers to a situation in which two or more predictor variables strongly correlate and do not contribute distinct information to the regression model. When the degree of correlation between variables is sufficiently strong, it can create difficulties in both the fitting process and the interpretation of the model. One method for detecting multicollinearity is to use the variance inflation factor (VIF), a statistic that evaluates the correlation between predictor variables in a regression model. By reviewing the works of Liang et al. (2012) and Leong, Jaafar, and Ainin (2018), the existence of multicollinearity was evaluated via the VIF and collinearity tolerance. Hair et al. (2016) and Gao et al. (2021) suggested that when the VIF was below 10, and the tolerance was above 0.10, this indicated that there was no multicollinearity problem. The VIFs of the items measured in the current study range from 1.376 to 1.726, which means multicollinearity is unlikely to be a problem.