Explore chapters and articles related to this topic
Data Collection and Analysis
Published in James William Martin, Lean Six Sigma for the Office, 2021
The box plot shown in Figure 5.14 provides a non-parametric graph of a continuous variable. It shows both the central location, i.e., the median and dispersion represented by the range of the dataset. The range is calculated as the maximum minus the minimum values. But other statistics are also provided. These are the 25 percentile, median, and 75 percentile levels. The 25 percentile is the data point at which 25% of the values are below. The median is the 50% percentile data point at which 50% of the values are below. Finally, the 75% percentile is a data point at which 75% of the values are below. Several box plots can be displayed in a comparative manner as shown in Figure 5.14. Discrete variables, having a common and continuous scale relative to each other, are displayed on the same graph. In the example shown in Figure 5.14, lost time is broken into the process waste categories that were shown in Figure 5.6. We also can see that the waiting category has a higher median lost time than excess inventory and a higher variation of time relative to the other categories. An asterisk represents data points marked as outliers. An outlier is a data point that is likely to be different than most of the sample data, i.e., furthest from the central location of the data. Box plots are a basis from which more advanced statistical methods can be applied to analyze the process data.
Outlier Detection and Removal: An Efficient and Effective Concept in Healthcare Sector
Published in Durgesh Kumar Mishra, Nilanjan Dey, Bharat Singh Deora, Amit Joshi, ICT for Competitive Strategies, 2020
Rahul Kumar, Rohitash Kumar Banyal
Indeed, outliers detection is an important subject that exists in areas such as credit card fraud detection, network anomaly identification, intrusions detections, traffic transportation outlier, industry damage detection, fake news and misleading information, security, and surveillance and criminal activities would be covered through e-commerce (Powar., 2017). Outlier detection aims to find a pattern in data mining that doesn’t confirm expected behavior. Their scopes of existence include insurance, telecommunication, customer segmentation, intrusion detection in cyber-security, fault detection in any safety-critical system and medical analysis, etc (An Introduction to Outliers – What are Outliers – Types of Outliers n.d.). Outlier detection methods can be as graphical visualization, box-plot, standard deviation or histogram tools (Bhattarai and Mn 2009).
Image and Localization of Behindthe-Wall Targets Using Collocated and Distributed Apertures
Published in Moeness G. Amin, Through-the-Wall Radar Imaging, 2017
There are a number of reasons to use composite CW waveforms with more than two frequencies. One apparent advantage of using multiple frequency components is to achieve frequency diversity against noise and propagation fading. Through-the-wall radar systems are often operated in a low SNR environment due to wall attenuation and other factors, and thus the phase information may not be reliable. Multipath propagation may further introduce frequency-dependent fading, causing very weak signals in some frequencies. By using M equally spaced frequencies, f0, f0 + Δf,…, f0 + (M − 1) Δf, where Δf is the difference between two adjacent carrier frequencies, the phase differences obtained from the M − 1 pairs with adjacent frequencies can be fused to yield robust range estimation against noise and frequency-selective multipath fading. One way to fuse these data is through a weighted average of the range estimates, with a high weight being assigned to the strong frequency pairs and a low weight to the weak ones [8]. When there is a possibility of having significant errors, an outlier analysis that excludes abnormal data is helpful. When equally spaced frequencies are used, the maximum unambiguous range remains c/(2Δf), which is the same as a dual-frequency radar with a frequency separation of Δf.
Wind speed forecasting using deep learning and preprocessing techniques
Published in International Journal of Green Energy, 2023
It is important to identify outliers in a time series since outliers can influence the performance of the forecast model and reduce its accuracy and reliability. For outlier detection, the statistical methods are the most common and can be divided into density-based, distance-based, correlation-based, and image-based (Zou and Djokic 2020). In this case, Interquartile Range (IQR) will be used to detect outliers. Figure 20 illustrates the IQR boundaries of the data (Frost 2021). Q1 is the first quartile and refers to 24.65% of the data that lies between minimum and Q1. Q3 is the third quartile with almost 75%. The difference between Q3 and Q1 is referred to as Inter-Quartile Range (IQR). Outliers can be identified as 1.5 times IQR or 3 times IQR from the central 50% of data. The 1.5 × IQR refers to minor outliers and 3 × IQR to major outliers. For this case study, the 3 × IQR will be used which detects the major outliers.
Exploratory Data Analytics and PCA-Based Dimensionality Reduction for Improvement in Smart Meter Data Clustering
Published in IETE Journal of Research, 2023
A box plot displays the distribution of data using five summary points – “minimum”, “first quartile” (Q1), “median”, “third quartile” (Q3), and “maximum” as shown in Figure 1. Here, “minimum” represents the minimum electricity consumption of a user in an epoch, “Q1” represents the value below which 25% of datapoints fall in an epoch, “Median” represents the value below which 50% of datapoints fall in an epoch, “Q3” represents the value below which 75% of datapoints fall in an epoch and “maximum” gives the maximum energy consumption in the respective epoch. Box Plot also helps in identifying outliers present in the data, finding if data is symmetrical, identifying how tightly packed the data is, and determining the skewness of the data. In this work, the box plot helps in identifying the distribution of electricity consumption data in each epoch, comparing the distribution of different types of users, and finding the Time of Use effect on the distribution pattern of users in a day [15].
Combination of Deep Learning Models for Student’s Performance Prediction with a Development of Entropy Weighted Rough Set Feature Mining
Published in Cybernetics and Systems, 2023
Sateesh Nayani, Srinivasa Rao P, Rajya Lakshmi D
Outlier removal: In this technique, the detection and removal of outliers using python. The analysis of outlier detection is also referred to as outlier mining. There are many ways to remove the outliers, whereas, in this work, the panda’s data frame is used for the outlier removal process. The panda’s data frame is utilized for real-time applications. The outlier is data or observation that is different from another dataset. These are considered samples which are deviated from the conventional data. The collected dataset contains extreme values beyond the range and differs from other datasets. The outlier identifies these unwanted data and eliminates them to improve the performance of feature mining and deep learning models. The cleaned data is given to the outlier removal process. The outlier removed data is attained as