Explore chapters and articles related to this topic
Exploratory Data Analysis and Data Visualization
Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022
With the advance of better algorithms in data science, the first three components of EDA have become less important. Specifically, residual analysis essentially involves fine-tuning the model by checking errors in a feedback loop. Today it can be automatically performed in gradient boosting and other machine learning methods. In gradient boosting, the algorithm checks the discrepancy between the model and the data in each attempt, gradually reducing the bias in a sequence of models. By the same token, data re-expression or data transformation can be accomplished in the hidden layer of artificial neural networks; the predictive accuracy is very impressive. In addition, many data science procedures are built to handle messy data, such that the use of resistant estimators is no longer necessary. As such, the remaining sections of this chapter will focus on data visualization, which remains indispensable today.
Importance of Normality Testing, Parametric and Non-Parametric Approach, Association, Correlation and Linear Regression (Multiple & Multivariate) of Data in Food & Bio-Process Engineering
Published in Surajbhan Sevda, Anoop Singh, Mathematical and Statistical Applications in Food Engineering, 2020
Here, our main goal is to minimize the error with the plotting of a best-fit line by using the method of least square. Residuals are the variation between the observed values and predicted values. For a multiple linear equation, the regression equation is given as follows (Johnson and Wichern, 2002). Y=β0+β1X1+β2X2+…+βrXr+ε
Evaluation of Surface Roughness in Turning with Precision Feed for Carbon Fibre-Reinforced Plastic Composites Using Response-Surface Methodology and Fuzzy Logic Modelling
Published in Kishore Debnath, Inderdeep Singh, Primary and Secondary Manufacturing of Polymer Matrix Composites, 2018
K. Palanikumar, T. Rajasekaran
The residual plots give the relation between the developed model and error; it gives additional information about the model development process. The closeness of the developed model is indicated by the normal probability plots (Figure 11.4a). If almost all the points follow the straight line, then the model is said to be stable. Unwanted or abnormal distribution of the points indicates model inadequacy. In the present investigation, the points are following almost straight line, which indicates that the model is effective. Figure 11.4b shows the relation between the externally studentised residuals and the predicted values; the results indicate that the residuals are distributed within the limits and hence the model is said to be adequate. Figure 11.4c shows the relation between the residuals and run number. The result shows that the values are distributed in both the positive and negative directions, which show that the experimental results are distributed evenly. The relation between the predicted and actual results is presented in Figure 11.4d. The results show the close relationship between the developed model and experimental values. As a whole, the diagnosis results do not show any model inadequacy and hence the developed models are very effective to predict the surface roughness in machining of CFRP composites.
Triaxial resilient modulus regression models for cold recycled asphalt mixtures
Published in Road Materials and Pavement Design, 2023
João Paulo Costa Meneses, Kamilla Vasconcelos, Linda Lee Ho, Liedi L. B. Bernucci
The linear least-squares method (LSM) is a commonly used statistical tool in linear regression analysis. This method seeks for the minimisation of the sum of the residuals’ squares. Residuals are defined as the difference between the observed values (test values) and the estimated values (modelled values). Because this is a simple and direct method, the data fitting of linear regression models is largely done employing the LSM. However, the models evaluated in this work are not linear, as the relation between TxRM (dependent variable) and the independent variables is not linear. The literature mentions that the LSM solution is not adequate for fitting non-linear models (or non-linearisable models) (Rawlings et al., 1998). For those, non-linear regression methods are necessary (Draper & Smith, 1998). Equations (3)–(5) models can be linearised after appropriate variable transformations; however, it would make the residuals analysis significantly more complicated. Therefore, in this study, a non-linear estimation method was used, specifically through the Levenberg–Marquardt algorithm (LMA) (or Marquardt’s compromise).
Thermodynamic performance and emission prediction of CI engine fueled with diesel and Vachellia nilotica (Babul) biomass-based producer gas and optimization using RSM
Published in Petroleum Science and Technology, 2022
Jeewan Vachan Tirkey, Deepak Kumar Singh
The normal probability plot for all three output responses is obtained using the RSM technique. These plots are very important for determining the efficiency of the model. in this type of plot, the X-axis is taken as the residuals whereas Y-axis is taken as the percent value. Residual is the difference between the exact value and the predicted value. Exact values are those values that are predicted by using FORTRAN. Predicted values are those values that are obtained by using regression equations using ANOVA analysis. These plots have scattered output response points. These scattered points deviate from one straight line that is known as a theoretical normal distribution. The closeness of scattered points from this straight line depicts the higher efficiency of the developed model and vice versa. Figure 5a depicts the normal probability plots. The Pareto chart depicts the influence of multiple independent parameters in predicting the output response. Figure 5b depicts the Pareto plots for engine performance and exhaust emissions. It can be inferred from the Figure 5b that CO and NO emission is mainly influenced by the injection timing. All the operating parameter except linear term of injection timing are insignificant in predicting CO emission. Linear and square term of blending ratio and injection timing are significant in predicting NO emission. Except this all terms are insignificant in nature. It can be inferred from the plot that performance parameters are mainly influenced by the blending ratio of producer gas.
Photo-treatment of TNT wastewater in the presence of nanocomposite of WO3/Fe3O4
Published in Particulate Science and Technology, 2021
Hamid Reza Pouretedal, Zahra Bashiri, Mohammad Nasiri, Ali Arab
The plot of normal probability of residuals, plot of residuals versus order, and plot of residuals versus fits are shown in Figure 6-I, 6-II, and 6-III, respectively. A residual plot is a graph that is used to examine the goodness-of-fit in regression and ANOVA (Behera et al. 2018; Pouretedal, Damiri, and Shahsavan 2018). Examining residual plots used to determine whether the ordinary least squares assumptions are being met. If these assumptions are satisfied, then ordinary least squares regression will produce unbiased coefficient estimates with the minimum variance. Normal probability plot of residuals applied to verify the assumption that the residuals are normally distributed. The plot of residuals versus order verified the assumption that the residuals are uncorrelated with each other and independent residuals show no trends or patterns when displayed in time order. Finally, the plot of residuals versus fits showed the assumption that the residuals have a constant variance.