Collinearity – Knowledge and References

Explore chapters and articles related to this topic

Multiple Linear Regression

Published in Marcello Pagano, Kimberlee Gauvreau, Heather Mattie, Principles of Biostatistics, 2022

Marcello Pagano, Kimberlee Gauvreau, Heather Mattie

Irrespective of the strategy we choose to fit a particular model, we should always check for the presence of collinearity. Collinearity occurs when two or more of the explanatory variables are correlated to the extent that they convey essentially the same information about the observed variation in y. One symptom of collinearity is the instability of the estimated coefficients and their standard errors. In particular, the standard errors often become very large; this implies that there is a great deal of sampling variability in the estimated coefficients.

All Things Being Equal – But How? (Designing the Study)

View Chapter

Purchase Book

Published in Mitchell G. Maltenfort, Camilo Restrepo, Antonia F. Chen, Statistical Reasoning for Surgeons, 2020

Mitchell G. Maltenfort, Camilo Restrepo, Antonia F. Chen

Further, if two predictors are correlated with each other, then the model may not be able to detect whether either one of them is actually associated with the outcome, and neither one may appear to be useful. This phenomenon is called collinearity, and those readers fondly remembering linear algebra may be thinking of the consequences of a set of equations being of less than full rank. Having collinear predictors won’t ruin a model’s ability to predict outcomes, but it will mean that model coefficients will be nonsensical. We remember one study where predictors were highly correlated, and we had to discuss with the researcher which measurements should be retained. In this case, criteria for keeping predictors included which measurements were noisier than others and which measurements were likelier to be used by other clinicians. You can check for collinearity by any of the following: (a) test statistically for correlations between predictors; (b) systematically add and remove predictors and see if other predictor values change; or (c) use a “variance inflation factor” which some statistical packages can return and which will tell you which parameters are potentially affected by correlations with other parameters.

Predictive Modeling with Supervised Machine Learning

View Chapter

Purchase Book

Published in Altuna Akalin, Computational Genomics with R, 2020

Altuna Akalin

Highly correlated predictors can lead to collinearity issues and this can greatly increase the model variance, especially in the context of regression. In some cases, there could be relationships between multiple predictor variables and this is called multicollinearity. Having correlated variables will result in unnecessarily complex models with more than necessary predictor variables. From a data collection point of view, spending time and money for collecting correlated variables could be a waste of effort. In terms of linear regression or the models that are based on regression, the collinearity problem is more severe because it creates unstable models where statistical inference becomes difficult or unreliable. On the other hand, correlation between variables may not be a problem for the predictive performance if the correlation structure in the training and the future tests data sets are the same. However, more often, correlated structures within the training set might lead to overfitting.

Earlier continuous renal replacement therapy is associated with reduced mortality in rhabdomyolysis patients

View Article

Journal Information

Published in Renal Failure, 2022

Xiayin Li, Ming Bai, Yan Yu, Feng Ma, Lijuan Zhao, Yajuan Li, Hao Wu, Lei Zhou, Shiren Sun

Categorical variables were described as frequencies and percentages and tested using the Chi-square test or Fisher’s exact test. Normally distributed continuous variables were expressed as mean ± standard deviation (SD) and tested using Student’s t-test. Continuous variables with non-normal distribution were expressed as interquartile range (IQR) (the 25th to 75th percentile) and tested using the Mann–Whitney U-test. Independent risk factors of 90-day mortality were identified using logistic regression analysis. Collinearity diagnosis was employed to eliminate highly related variables. For variables with extremely skewed distributions, logarithmic transformation was performed in the analysis. The Kaplan–Meier curve was used to describe the patient accumulated survival proportion and intergroup comparisons were performed using the log-rank test. Subgroup analysis was performed according to the patient etiologies, stage of AKI, APACHE II score, and SOFA score. All statistics were calculated using IBM SPSS Statistics version 22 (IBM Corp., Armonk, NY). A two-sided p value <0.05 was considered statistically significant.

Negative anti-phospholipase A2 receptor antibody status at three months predicts remission in primary membranous nephropathy

View Article

Journal Information

Published in Renal Failure, 2022

Gabriel Stefan, Simona Stancu, Adrian Zugravu, Otilia Popa, Dalia Zubidat, Nicoleta Petre, Gabriel Mircescu

Survival analyses were conducted with the Kaplan–Meier method, and the log rank test was used for comparison. Univariate and multivariate Cox proportional hazard analyses were performed to identify independent predictors of the primary and secondary endpoints. If the p value of the candidate predictor in univariate survival analysis was <.05, this predictor was included in multivariable Cox regression model. Results were expressed as hazard ratio (HR) and 95% confidence interval (CI). Moreover, we used two methods in order to test for collinearity among our predictor variables: (i) the variance inflation factor (VIF), where VIF <10 is desirable; (ii) the absolute value of correlation coefficients, where |r| or |rs| <0.7 is desirable. There was no significant collinearity between the variables used in Cox proportional hazard models.

Corneal Eccentricity in a Rural Japanese Population: The Locomotive Syndrome and Health Outcome in Aizu Cohort Study (LOHAS)

View Article

Journal Information

Published in Ophthalmic Epidemiology, 2022

Yuto Yoshida, Koichi Ono, Takatoshi Tano, Yoshimune Hiratsuka, Koji Otani, Miho Sekiguchi, Shinichi Konno, Shinichi Kikuchi, Masakazu Yamada, Shunichi Fukuhara, Akira Murakami

Simple and multivariate linear regression analyses were used to estimate associations between eccentricity and other factors. We adjusted for the following possible confounding factors, based on previous studies: age, gender, body mass index (BMI), spherical equivalent, pupil diameter, anterior chamber angle, anterior chamber volume, and central corneal thickness. Stepwise linear regression was also performed, and p < .20 was used as a selection criterion for comparing these results with those from simple and multivariate linear regression analyses. Model comparison between multivariate linear regression and stepwise linear regression was performed using Akaike’s information criterion (AIC) and Bayesian information criterion (BIC),21,22 with lower AIC or BIC values indicating a better fit. Collinearity statistics were used to assess possible collinearity between covariates. The results were below the critical collinearity values (variance inflation factor <10). For categorical variables, proportions were calculated for all participants included in the analysis. For continuous variables, the mean ± standard deviation (SD) was calculated across all participants included in the analysis. We also calculated the mean ± SD eccentricity across all participants. We used STATA/SE 15.0 for Mac (Stata Corp, College Station, TX, USA) for the analyses and p < .05 was considered statistically significant.