Explore chapters and articles related to this topic
An investigation of carefulness among students using an educational game for Physics
Published in Shin-ya Nishizaki, Masayuki Numao, Jaime Caro, Merlin Teodosia Suarez, Theory and Practice of Computation, 2019
In the field of data mining, outliers are usually removed as they distort the resulting model. Not handling outliers skews the model such that the mean and covariance estimates of the observations do not reflect actual behavior of the data, hence, the need to detect and remove outliers. Outlier detection has been used to find anomalies in datasets to better understand the deviation of observation points to the central tendencies or behavior exhibited by the entire sample. It involves finding patterns like anomalies, discordant observations, faults, defects, or peculiarities in the data that do not conform to expected behavior (Chandola, Banerjee & Kumar, 2009). Other work in outlier detection involves the use of standard deviation as the method to determine if a certain point is an outlier. A set threshold is identified as the criteria, for example – if a certain observation point is three (3) standard deviations from the mean then that point is considered as an outlier. As outliers affect the standard deviation and the mean, the method of using standard deviations to detect outliers may be problematic. Hence, a number of outlier detection work used clustering to detect outliers (Elahi, et al. 2008; Pamula, Deka & Nandi 2011). In Educational Data Mining (EDM), outlier analysis resulted to an understanding of students’ behavior and detect students with learning problems (Romero & Ventura 2007). Outlier analysis was used to examine the factors that affect student achievement particularly in over and under-performing schools in the US. Clustering analysis in educational data mining answers research questions pertinent to understanding learners’ behavior during their use of educational software or intelligent tutoring systems (Baker & Yacef 2009). In general, it is used to discover natural groupings of data with the goal of achieving homogeneity. For this paper, we used X-means clustering (Pelleg & Moore 2000) as we wanted the algorithm to discover the optimal number of clusters that are naturally formed in the data (Jain 2010). The detection of outliers used the local outlier factors (LOF) where the local density of an object is compared to the local densities of its neighbors (Breunig, et al. 2000). Outliers are the points that have lower density than the densities of their neighboring points.
Neighborhood optimization of intelligent wireless mobile network based on big data technology
Published in International Journal of Computers and Applications, 2021
Figure 8 shows that the larger the number of dropouts in the network, the lower the utilization rate of the equipment. From the aspect of network utilization, when the network utilization is the same, the more connected devices and the more dropped lines, it shows that the capacity of the network is limited. When new access devices are used, it is likely to cause overload and dropped lines. In addition, it can also be seen that there are more network connection devices in some time periods in some communities, and the network utilization rate at this time is nearly 100%. If network devices need to be added in the future life, network overload may occur and the number of dropouts will continue to increase. The anomaly detection method based on improved LOF algorithm can effectively eliminate the influence of abnormal data on feature extraction and time series prediction, and improve the accuracy and rapidity of feature extraction and data prediction.
Health assessment of wind turbine based on laplacian eigenmaps
Published in Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 2020
Tao Liang, Zhaochao Meng, Jie Cui, Zongqi Li, Huan Shi
The presence of outliers can have a large impact on health assessment results. In the research (Breunig et al. 2000), the Local Outliers Factor (LOF) algorithm is a density-based anomaly detection method. By examining the degree of difference between the object and its neighbor density, it is judged whether the point is an abnormal point. The wind turbine operation data have the characteristics of high-dimensional nonlinearity. In this study, the Gaussian kernel density estimation Local Outliers Factor (GLOF) (Tang and He 2017) algorithm was used to detect the anomaly points. The degree of difference between the data was measured by local outlier factor of Gauss kernel.
Enabling low-cost automatic water leakage detection: a semi-supervised, autoML-based approach
Published in Urban Water Journal, 2022
Willian Muniz Do Nascimento, Luiz Gomes-Jr
The algorithms use in the implementations (i) and (iii) were: (a) a density-based technique called Local Outlier Factor (LOF), (b) a Neural Network-based technique called Self-Organizing Maps (SOM), and (c) a statistical approach called Standard Score (Z-Score). We added a fourth algorithm based on hard-coded rules that simulate the decision process currently used in the water supply company. This algorithm was named Specialist and served as a baseline to assess the performance of the other algorithms. Implementation (ii) does not use the aforementioned algorithms since it applies its own supervised algorithms in the optimization process.