Explore chapters and articles related to this topic
Introduction
Published in Jhareswar Maiti, Multivariate Statistical Modeling in Engineering and Management, 2023
Outliers are the observations that do not belong to the general mass. It is applicable to single as well as multiple variables. For single variables, it is termed univariate outliers, and for multiple variables simultaneously occurring, it is called multivariate outliers. Univariate outliers can be detected using dot plot. Another approach is use of standardized values for the variable. Any standardized values more than three can be considered as outliers at 99.73% confidence level. For two variables, bivariate scatter plot can be used (e.g., see Figure 1.12). For multivariate outliers, Mahalanobis distance is measured for each of the observations and chi-square percentile values are used to identify outliers. Gnanadesikan and Kettenring (1972) provided a list of statistics to be used for detection of multivariate outliers where each of the statistics focuses on certain key features of outliers. For example, Mahalanobis distance detects the multivariate observations that lie far from the general mass (scatter) of data.
Single-sensor Real Time Damage Detection Techniques: RSSA and its Variants
Published in Basuraj Bhowmik, Budhaditya Hazra, Vikram Pakrashi, Real-Time Structural Health Monitoring of Vibrating Systems, 2022
Basuraj Bhowmik, Budhaditya Hazra, Vikram Pakrashi
In the recent years the use of TVAR modeling is widely accepted in the area of real time damage detection studies of dynamical systems in the publications [4-6, 8]. TVAR estimates the coefficients in real time which measure the change in the sub-space at each time instant and indicate the same as damage at that instant. One of the premises of multivariate analysis is the measurement of separation between two clusters [15]. In multivariate analysis the use of Mahalanobis distance (MD) as a statistical measure has found numerous applications such as: identification of outliers [21], determination of representativity between parent and sampled data sets [22], in pattern recognition problems viz. k Nearest Neighbor method (kNN) [23], discriminant analysis [24], disjoint modeling techniques for recognition of distribution [25] and more. The use of Mahalanobis distance however depends on clustering of data and thus not applicable to online implementation. In this regard a recently developed DSF RMD [7], the recursive version of MD is implemented for damage detection case studies in the present work.
Shrinkage Methods
Published in Julian J. Faraway, Linear Models with Python, 2021
We could simply take the covariance of this central subset of the data as the input for the PCA, but we may wish to identify (and perhaps discard) the outliers. We might look for points which are far from the mean but this needs to take account of how the data varies. Mahalanobis distance is a measure of the distance of a point from the mean that adjusts for the correlation in the data. It is defined as di2=(x−μ)TΣ−1(x−μ)
Automatic clustering-based approach for train wheels condition monitoring
Published in International Journal of Rail Transportation, 2022
Araliya Mosleh, Andreia Meixedo, Diogo Ribeiro, Pedro Montenegro, Rui Calçada
Generally, the literature assumes that the Mahalanobis squared distance can be approximated by a chi-squared distribution in n-dimensional space. Thus, the Mahalanobis distance can be approximated by a Gaussian distribution and an outlier analysis can be performed based on a statistical threshold. Under this hypothesis, a confidence boundary (CB) for detecting a damage index consisting of an outlier can be estimated with the Gaussian inverse cumulative distribution function (ICDF) by considering a mean value and standard deviation of the baseline feature vector, and for a level of significance , in the form of:
Multivariate fault detection for residential HVAC systems using cloud-based thermostat data, part I: Methodology
Published in Science and Technology for the Built Environment, 2022
Fangzhou Guo, Austin P. Rogers, Bryan P. Rasmussen
The same KDE method can be applied for multivariate statistics, but in a high dimensional space with thousands of systems, the computational load is very high. Therefore, the KDE of the squared Mahalanobis distance is applied as a metric to find outliers within each subset of operational conditions. The Mahalanobis distance () is a measure of the distance between a point and its mean of the multidimensional distribution (Mahalanobis 1936). The distance is zero if is exactly at the mean of the distribution and grows as moves away along each principal component axis. Compared to the Euclidean distance, the Mahalanobis distance is scale-invariant and takes the correlations of the dataset into consideration as well. It is defined as follows: where and are respectively the vector of means and the covariance matrix of the distribution. If is the identity matrix, the Mahalanobis distance will reduce to the Euclidean distance.
Scour at bridge piers in uniform and armored beds under steady and unsteady flow conditions using ANN-APSO and ANN-GA algorithms
Published in ISH Journal of Hydraulic Engineering, 2021
Samaneh Karkheiran, Abdorreza Kabiri-Samani, Maryam Zekri, Hazi M Azamathulla
The selection of input variables was based on a mutual information (MI) algorithm (Battiti 1994). The objective of this algorithm is to maximize relevance between inputs and outputs and minimize the redundancy of selected inputs (Kraskov et al. 2004). The next step in data selection was the identification of data outliers. Outliers may arise due to data errors, noise, anomalies, improper sensors, or data entry errors (Bartkowiak and Szustalewicz 1997). The search for outliers is based on the location and spread of data. In the present study, to detect outliers, we applied the Mahalanobis distance procedure (Gnanadesikan 1977). The input and output data were normalized for training and testing, ranging between 0 and 1, applying Equation (14):