Explore chapters and articles related to this topic
Develop
Published in Walter R. Paczkowski, Deep Data Analytics for New Product Development, 2020
The X matrix is of order P × L where L is the total number of levels for all attributes and their levels. If each attribute has two levels as an example, then X has size P × A where A is the number of attributes.9 The number of attributes can be large enough so it is safe to assume that P << A; that is, the number of observations, P, is much less than the number of variables, A. This is a problem for OLS because OLS requires that P > A for estimation so in this situation estimation is impossible. In addition, the attributes are usually associated with some degree of correlation which, despite the use of an encoding scheme for any one attribute, still introduces multicollinearity. The collinearity jeopardizes estimation by making the estimated parameters unstable and with possibly the wrong signs and magnitudes. Two ways to handle the model in (3.3) are with partial least squares (PLS) estimation and neural networks.
Machine learning solutions for development of performance deterioration models of flexible airfield pavements
Published in Inge Hoff, Helge Mork, Rabbira Garba Saba, Eleventh International Conference on the Bearing Capacity of Roads, Railways and Airfields, Volume 3, 2022
A.Z. Ashtiani, S. Murrell, R. Speir, D.R. Brill
The historical environmental/climate data are the most comprehensive feature in the PA40 database. Researchers initially identified thirteen (13) environmental variables that may influence the pavement performance (Table 1). Environmental variables were calculated as average values between the last rehabilitation/construction date and inspection, or between two inspections, for each pavement section in the database. Environmental variables exhibited temporal behavior, i.e., their values were not constant over the years the performance data were collected. This indicates that in some years the pavements were exposed to more severe weather conditions than other years, which could accelerate pavement deterioration. Given that flexible pavement performance data were available for only 10 runways at 8 airports, there were more independent variables than the number of climate locations. This may pose challenges in developing reasonable predictive models. Therefore, it is desired to select or construct a subset of environmental parameters that are useful to build a good prediction model. A common problem in ML model development is predictors that are highly correlated. In this case one or more of the highly correlated predictors can be omitted because no additional information is gained by including them (Guyon and Elisseeff, 2003). High collinearity may negatively affect the prediction performance of the model. However, the major problem with high collinearity is that it makes it difficult for the predictive model to assess the relative importance of the predictors with respect to the target. To this end, researchers examined the temporal dependency among the environmental variables based on yearly fluctuations. In addition, researchers performed various collinearity tests to explore the correlation between the environmental variables. One collinearity test was to create Pearson correlation matrices to examine the strength of pairwise linear relationships. Correlations exceeding a threshold of 0.7 were considered “high collinearity.” Based on this threshold, “Days Precipitation”, “Total Precipitation” and “Thornthwaite Index” are highly positively correlated to one another. Also, “Freeze-Thaw Cycles” is highly negatively correlated to “Average Daily Temperature” and positively to “Freezing Degree Days.”
Potentiality of tree variables as predictors in pavement roughness progression rate modelling
Published in Australian Journal of Civil Engineering, 2022
A factor that can compromise the effectiveness of multiple linear regression analysis is the evidence of multi-collinearity present among the predictors (Mansfield and Helms 1982). In statistics, multi-collinearity is a phenomenon that refers to the existence of high correlation between two or more predictor variables. In other words, if multi-collinearity among the predictor variables exists, one variable can be linearly predicted from others with a substantial degree of accuracy which ultimately compromises the predictability of the regression equation. Multi-collinearity among the predictors can be identified using Tolerance (T) and Variance Inflation Factor (VIF) as presented in equations 2 and 3.
Recursive pseudo fatigue cracking damage model for asphalt pavements
Published in International Journal of Pavement Engineering, 2021
Kenneth A. Tutu, David H. Timm
A Pearson correlation matrix and pairwise scatter plots depicted the correlation among ten potentially-relevant predictor variables. After a careful review, three variables that were significantly correlated to the response variable (BETA) and were not correlated to each other (non-collinear) were selected for building the β-parameter regression equation. Collinearity hinders the estimation of the unique effects of predictor variables, yields sensitive regression coefficients and generates large sampling errors of regression coefficients (Chatterjee and Hadi 2012). The selected predictors were strain level, initial AC modulus and FEL. Note that FEL and ε0 (Equations 5b and 5c) both refer to fatigue endurance limit. The regression analysis proceeded to develop a relationship between the β-parameter (BETA) and the selected predictors, as per Equation 6. where BETA = fatigue damage-related parameter; STR = induced tensile strain, µε; E0 = initial AC modulus, ksi; and FEL = fatigue endurance limit, µε.
Assessment of pile drivability using random forest regression and multivariate adaptive regression splines
Published in Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 2021
Wengang Zhang, Chongzhi Wu, Yongqin Li, Lin Wang, P. Samui
Before running a regression analysis, it is necessary to check whether there are correlations between feature variables. The stronger collinearity will lead to unstable modelling results. Figure 3 is the Spearman rank's correlation coefficient matrix heatmap of the feature variables and label variables. (|R| = 0 is an uncorrelated relationship; |R| < 0.4 is a weak correlation; 0.4 < |R| < 0.75 is a correlation; 0.75 < |R| < 1 is a strong correlation; |R| = 1 is fully correlated). When the correlation between the feature variables is strong (close to 1), it is called multicollinearity, which will affect the modelling results, causing the model biased or unavailable. Just as (x1 with x2), (x7 and x8 with x11) and (x9 with x10) in Figure 3. The simultaneous presence of these feature variables will influence the efficiency of modelling, making it necessary to implement feature selection first.