Explore chapters and articles related to this topic
Implementation of Data-Driven Approaches for Condition Assessment of Structures and Analyzing Complex Data
Published in M.Z. Naser, Leveraging Artificial Intelligence in Engineering, Management, and Safety of Infrastructure, 2023
Vafa Soltangharaei, Li Ai, Paul Ziehl
AE waveforms are non-stationary signals (Suzuki et al., 1996a). Therefore, different statistical and signal processing methods can be used to analyze data and derive essential information from the complex data. The outliers in signal features can be identified using statistical methods. In statistics, an outlier is defined as an observation with a large deviation from other observations (Hawkins, 1980). Outliers in a dataset may result in confusion and errors in some data-driven techniques. For instance, k-means methods are expected to misclassify a dataset with outliers (Tan et al., 2013). Therefore, in some data-driven techniques, identification and removing outliers from data before conducting any analyses is significant. One of the simplest statistical methods is end-trimming data. In this method, a percentage (i.e., 5% or 10%) of the highest and lowest data is considered as outlier data and removed from the data set (Ott and Longnecker, 2015). Another common statistical method is using box-and-whiskers plots. By calculating a median, lower quartile, and upper quartile, a boxplot can be drawn for a dataset. Then an inner fence and an outer fence are determined using quartiles and an interquartile range. Finally, mild and extreme outliers are identified by observing the data in the inner and outer fences (Ott and Longnecker, 2015).
Visual approaches to drought analysis
Published in Vitali Díaz Mercado, Spatio-Temporal Characterisation of Drought: Data Analytics, Modelling, Tracking, Impact and Prediction, 2022
In the tasks mentioned above, some data visualisation are frequently used to explore or explain drought-related data. To analyse or show how a drought characteristic change over time, line graph or area chart are often selected, while to analyse the correlation between different drought characteristics, scatterplots are preferred. Histograms are often used to present the statistical distribution, whereas boxplots are picked up to visually summarise the statistics (e.g. mean, standard deviation, maximum, minimum). To identify patterns on the variation of drought over time, heat maps, also referred to as colour-coded table, are considered. In this columns-rows arrangement, the information is organised to show how a given drought characteristic changes over time. The colour assigned to each cell relates to the magnitude of the analysed drought characteristic. The adjacent cells with similar colour aligned in columns (or rows) allow identifying periods with high (or low) intensity.
Statistical Methods for Reproducible Data Analysis
Published in Asis Kumar Tripathy, Chiranji Lal Chowdhary, Mahasweta Sarkar, Sanjaya Kumar Panda, Cognitive Computing Using Green Technologies, 2021
Sambit Kumar Mishra, Mehul Pradhan, Rani Aiswarya Pattnaik
Outliers: Any values that fall outside the range of data is termed as an outlier [14]. Reasons for Outliers are:Typos – Outliers during data collection. For example, adding an extra zero by mistake.Measurement error – Outliers in data due to measurement operator being faulty.Intentional error – These are errors that are induced by people intentionally. For example, teens might claim they’ve had less amount alcohol than they actually have.Legit Outliers – These are values that are not actually errors but are in the data due to legitimate reasons. For example, a CEO’s salary might be remarkably high compared to other employers.Outliers can be detected using a Box plot. When there is a presence of some data values outside the whiskers of the Box plot, one can observe outliers easily.
Screening and optimization method of defect points of G code in three axis NC machining
Published in International Journal of Computer Integrated Manufacturing, 2023
Dun Lyu, Yanhong Song, Pei Liu, Wanhua Zhao
Box plot is used to analyze the error data solved by 2.2.2. The PS results will be obtained in this section. Box plot is a graph that describes data through lower quartile, median, upper quartile, upper limit and lower limit. The method of constructing the box plot is: suppose a set of sequence numbers contain n items, arrange them from small to large. Solve the lower quartile Q1, median Q2, upper quartile Q3, where Qi is located for . Then, the upper term is solved, where . The lower limit is solved, . Finally, the box plot is drawn as shown in Figure 8. In Figure 8, the value greater than the upper limit is outliers, and the corresponding points are the defect points.
Statistics of Atterberg limit values of some pure kaolinitic clays
Published in Geomechanics and Geoengineering, 2023
Giovanni Spagnoli, Satoru Shimobe
Figure 3(A and B) shows the box plots and the dot plots of the Atterberg limits listed in Table 1, respectively. From Figure 3A, a relatively high dispersion of LL values is observed with respect to PL and PI. However, a higher range of values are observed with respect to the average values observed by Holtz and Kovacs (1981), developed on the data of Mitchell (1976) for kaolinitic clays. In box plots, the box represents the interquartile range (IQR) where the bottom and top of the box are the 25th and 75th percentiles, respectively. The whiskers extend to the last data value inside the inner fence, which represent 1.5 times the IQR from the edge of the box. In Figure 3A, some points are identified as ‘suspected outliers’, which are those falling within the inner fence. Figure 3B simply shows the shape of the data, with seem to be fairly normal, at least visually.
A multi-output deep learning model based on Bayesian optimization for sequential train delays prediction
Published in International Journal of Rail Transportation, 2022
Jie Luo, Ping Huang, Qiyuan Peng
Next, we analysed the residuals by stations with a standard statistical graph, i.e. box plot. The residual is calculated with Equation 18. The box plot conveys the information of the median (the line in the box), the quartiles (the upper and lower bounds of the box), the range of quartiles (the height of the box), the edge values (the upper and lower horizontal lines connected to the box), and the outliers (the points outside the edge values). Figure 10 shows that the median is near 0 min, and the absolute quartiles are within 1 min. The figure also shows that the box of station ZZW is the highest while the box of station QY is the lowest, which implies that the model’s performance gradually decreases as trains move from station QY to station ZZW. This may be caused by the increasingly complex interactions between trains.