Explore chapters and articles related to this topic
Effect of preservation treatments on pavement performance
Published in Maurizio Crispino, Pavement and Asset Management, 2019
James Bryce, Gonzalo Rada, Gary Hicks
The change in performance in terms of rutting and IRI was calculated using the LTPP and Virginia DOT data. The approach to estimating changes in performance was to fit a robust regression line to the condition data following the overlay, and define the average growth rate of rutting and IRI as the slope of the robust regression line. Robust regression is a family of techniques that is designed to reduce the effect of potential outliers on the results of regression (Gelman & Hill, 2007). It was assumed that a linear model could be used to estimate the growth rates of rutting and IRI given the relatively few number of years available following the treatment (6 to 10 years). Once the growth rate following the thin overlay was found, regression was used to compare the growth rate of rutting and IRI to other independent variables that were collected.
Artificial Intelligence for Network Operations
Published in Mazin Gilbert, Artificial Intelligence for Autonomous Networks, 2018
Finally, on the application analytics side, it is important to explicitly represent and handle missing and erroneous data. A common mistake is to carelessly treat missing data as being a data value of zero, polluting the result and potentially wreaking havoc with the network. Similarly, data that is clearly erroneous, such as CPU values exceeding 100% utilization or link utilization exceeding capacity, must be appropriately identified and handled (rounded or dropped entirely). Catching these data errors early on minimizes the so-called “garbage-in, garbage-out.” Furthermore, depending on the application, it can be helpful to leverage robust analytics where applicable: using median statistics is less subjective to occasional erroneous data than mean statistics; using robust regression is a better alternative to least-squares regression when data are contaminated with outliers or influential observations.
Ulysses’ Compass
Published in Richard McElreath, Statistical Rethinking, 2020
One way to both use these extreme observations and reduce their influence is to employ some kind of robust regression. A “robust regression” can mean many different things, but usually it indicates a linear model in which the influence of extreme observations is reduced. A common and useful kind of robust regression is to replace the Gaussian model with a thicker-tailed distribution like Student’s t (or “Student-t”) distribution.131 This distribution has nothing to do with students. The Student-t distribution arises from a mixture of Gaussian distributions with different variances.132 If the variances are diverse, then the tails can be quite thick.
For Love or Money? Examining Reasons behind OSS Developers’ Contributions
Published in Information Systems Management, 2022
Joseph Taylor, Ramakrishna Dantu
A multiple linear regression model was used to examine the hypothesized effects. Models A-G are analyzed using robust regression. Robust regression is an alternative to least squares to control for potential outliers or unusually influential observations (Rousseeuw & Leroy, 2005). The data are normally distributed. Variance inflation factors did not exceed 1.15, well below a 2.5 threshold indicating potential multicollinearity concerns. Due to the normal distribution of the data, and low VIF, no transformations were completed on the data prior to regression analysis. Using this approach, the variables that have a significant relationship with the dependent variable are identified. Given the size of the data set, listwise deletion was used for records that did not include all relevant data for a model.
The value of incremental environmental sustainability innovation in the construction industry: an event study
Published in Construction Management and Economics, 2021
Linh N. K. Duong, Jason X. Wang, Lincoln C. Wood, Torsten Reiners, Mona Koushan
We used robust regression analysis in this study. We followed Fox (2015) and identified outliers (studentized residuals outside of the range [-2, 2]) and influential observations (over three times average hat-value) in our dataset; the traditional OLS regression relies on a strong assumption of absence of outliers. The outlying observations in our dataset would bias the OLS estimation of coefficients and statistical significance. The removal of these outlying observations would reduce the sample size and explanatory power of the model. Robust regression, using weighted least squares, can accommodate the outlying observations in the analysis while minimizing their influence scale, which provides additional insight from the variances in the dataset and improves the reliability of analysis results (Johnson 2002). We used the MM-estimator in robust regression analysis, following Fox (2015). We tested the multicollinearity in the regression model by using the variance inflation factors (VIF). All VIF values less than 3, providing evidence of low multicollinearity (Cohen et al.2013).
Robust AFT-based monitoring procedures for reliability data
Published in Quality Technology & Quantitative Management, 2020
Shervin Asadzadeh, Arash Baghaei
Also, the values of quality characteristic for the second stage are prone to outliers. For adjusting the effect of outlier data on the regression model, a robust regression method has been implemented. Robust regression is a technique to decrease the detrimental effect of outlier observations. No robust regression can be accepted as the best method in general. Different methods show different strengths and weaknesses with regard to the percentage of outliers and the position of explanatory and passive variables. The differences among the existing methods are related to three features entitled the breakdown point, efficiency and the bounded influence. M-estimators are the most common method among other robust regression techniques and are preferable when there are only outliers available in the response observations. If the data set is expressed visually on the graph, outlying points will be far away from the other values. Thus, the easiest way is to draw the scatter plot of the data to investigate if there are outliers in response observations. Moreover, determining the boundaries by finding the interquartile range (IQR) and multiplying it by 1.5 and then adding and subtracting it to upper quartile and lower quartile, can be considered as another useful method to check whether we have outlying observations in input or output quality variables. This should be done before implementing M-estimator for parameter estimation to be sure that only response observations are prone to outliers. Otherwise, if there are outliers in both variable, compound estimators should be applied which is not discussed in this paper. M-estimation tries to find the minimum of the function ρ of the residuals (r) and modify the influential observations by applying proper weights (W) to each observation (Maronna, Martin, & Yohai, 2006). The two common M-estimators’ functions are the Huber and the Tukey bisquare, which diminish the effect of the outliers on the regression coefficients. Their objective and weight functions are given in Table 1.