Explore chapters and articles related to this topic
Data Collection and Analysis
Published in James William Martin, Lean Six Sigma for the Office, 2021
The box plot shown in Figure 5.14 provides a non-parametric graph of a continuous variable. It shows both the central location, i.e., the median and dispersion represented by the range of the dataset. The range is calculated as the maximum minus the minimum values. But other statistics are also provided. These are the 25 percentile, median, and 75 percentile levels. The 25 percentile is the data point at which 25% of the values are below. The median is the 50% percentile data point at which 50% of the values are below. Finally, the 75% percentile is a data point at which 75% of the values are below. Several box plots can be displayed in a comparative manner as shown in Figure 5.14. Discrete variables, having a common and continuous scale relative to each other, are displayed on the same graph. In the example shown in Figure 5.14, lost time is broken into the process waste categories that were shown in Figure 5.6. We also can see that the waiting category has a higher median lost time than excess inventory and a higher variation of time relative to the other categories. An asterisk represents data points marked as outliers. An outlier is a data point that is likely to be different than most of the sample data, i.e., furthest from the central location of the data. Box plots are a basis from which more advanced statistical methods can be applied to analyze the process data.
A Convergence of Mining and Machine Learning: The New Angle for Educational Data Mining
Published in Vishal Jain, Akash Tayal, Jaspreet Singh, Arun Solanki, Cognitive Computing Systems, 2021
With NLP, machines are now capable of recognizing and understanding language just like humans. Examples are (a) chatting via text and (b) semantic search. Artificial intelligence and machine learning can transform education. Machine learning helps (a) teachers to focus on every student during the course teaching, (b) education for the specially abled students is possible through machine learning, and (c) grading assessment of the students. Machine learning helps us find patterns in data, and from that, predictions about new data points are made. To get those predictions right, we must construct the data correctly. There are some of the commonly used libraries in Python shown in Table 4.10 for machine learning.
Hydrologic frequency analysis
Published in James C. Y. Guo, Urban Flood Mitigation and Stormwater Management, 2017
Outliers are data points that depart significantly from the trend of the remaining data. Outliers can substantially affect the sample statistics. The sign of high outliers in the database is when the station’s skewness coefficient is greater than +0.4, and of low outliers is when the station’s skewness coefficient is smaller than −0.4. When logarithmic values are used for frequency analysis, the Water Resources Council (Bulletin 17, 1982) suggests that outliers be detected by the following: QH=Qlog+Z0SlogQL=Qlog−Z0Slog
Adaptive neuro-fuzzy interface system based performance monitoring technique for hydropower plants
Published in ISH Journal of Hydraulic Engineering, 2022
The scatter plots have been drawn to show how one variable influences another by plotting data points on a horizontal and vertical axis. The scatter plot of the data with the correlation coefficient is shown in Figure 1 with probability (p-value) values which indicate the likelihood that how the two variables are linked. Because of their chances of being unrelated are minimal. Therefore, strong correlations have low p-values. Also, the details of the proposed methodology are shown in Figure 2.
RKDOS: A Relative Kernel Density-based Outlier Score
Published in IETE Technical Review, 2020
Abdul Wahid, Annavarpu Chandra Sekhara Rao
An outlier is defined as a data point, which is significantly different from other data points. A desired outlier detection algorithm should not only produce an output like outlier or inlier but also give each sample of a data set a degree of outlier-ness.