T-distributed stochastic neighbor embedding – Knowledge and References

Explore chapters and articles related to this topic

Process Monitoring

Published in Jose A. Romagnoli, Ahmet Palazoglu, Introduction to Process Control, 2020

To visualize high-dimensional data, a possible technique is t-distributed Stochastic Neighbor Embedding (t-SNE)22 which converts similarities between data points to joint probabilities and attempts to minimize the divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. The Self-Organizing Maps (SOMs)23 are also used for data visualization and they are based on a neural network architecture. The versatility of SOM comes from its topology preservation and data representation abilities to isolate the key variables and patterns that emerge from the data. These tools will be illustrated by an example in the next section.

Dimension Reduction Breaking the Curse of Dimensionality

View Chapter

Purchase Book

Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022

Chong Ho Alex Yu

T-distributed Stochastic Neighbor Embedding (t-SNE) is a modern dimension reduction algorithm developed by van der Maaten and Hinton (2008). In t-SNE, high-dimensional data are reduced into a low-dimensional map, yet the original information is retained or minimally distorted, in the sense that similar response patterns are kept together while dissimilar instances are kept apart. Prior research shows that t-SNE considerably reduces computational time, and its accuracy is better than that of traditional methods (Cardona et al. 2020; van der Maaten and Hinton 2008). The major differences between PCA and t-SNE are as follows (Platzer 2013): Like many other classical procedures, PCA is deterministic: given the same data set, the program yields exactly the same output. In alignment with many other data science methods, t-SNE is probabilistic, meaning that in each run the output is slightly different from the previous one, even though the same data set is used.PCA assigns the data into n components, where n is the number of variables. For example, if there are 20 variables in the data set, there will be 20 potential components. By contrast, t-SNE collapses all information into fewer components.PCA uses an orthogonal linear transformation, whereas t-SNE uses a nonlinear reduction and components are not restricted to be orthogonal.PCA preserves the global structure of the data, whereas t-SNE preserves the local structure by minimizing the Kullback—Leibler divergence (KL divergence), which is a measure of dissimilarity of distributions.PCA is a traditional procedure without the options for tuning hyperparameters, whereas in t-SNE users can tune the hyperparameters, such as the learning rate.PCA is highly affected by outliers, while t-SNE is immune against them.

Pipeline leak diagnosis based on leak-augmented scalograms and deep learning

View Article

Journal Information

Published in Engineering Applications of Computational Fluid Mechanics, 2023

Muhammad Farooq Siddique, Zahoor Ahmad, Jong-Myon Kim

The main contributions of this study are: To visualize and distinguish leak features from noisy data, time-domain AE signals were converted to images using the CWT. A Gaussian filter was applied to the noisy image to smooth it, followed using a Laplacian filter for edge detection which results in leak-augmented scalograms. These leak-augmented scalograms revealed leak-related characteristics. A new approach is developed which utilizes leak-augmented scalograms and a CNN-CAE deep neural framework for pipeline health diagnosis. Additionally, the proposed method is designed to be effective in detecting leaks in pipelines transporting various types of fluids such as water and gases. To the best of the authors’ knowledge, this approach is not reported for pipeline leak diagnosis in the literature.The t-distributed stochastic neighbour embedding(t-SNE) is a dimensionality reduction technique commonly used in machine learning for visualizing high-dimensional data (Erfani & Goharian, 2023; Van der Maaten & Hinton, 2008). This study, its goal is to preserve the local and global data structures while mapping the complicated, nonlinear connections between the data points into a two- or three-dimensional space.Data from real industrial pipelines testbed developed on lab scale were utilized to validate the suggested methodology of the proposed method.

A region-specific clustering approach to investigate risk-factors in mortality rate during COVID-19: comprehensive statistical analysis from 208 countries

View Article

Journal Information

Published in Journal of Medical Engineering & Technology, 2021

Poojita Garg, Deepak Joshi

For the purpose of clustering the countries into groups, a popular unsupervised machine learning algorithm, namely k-means from Scikit Learn was used. The above mentioned 13 common predictors for each country were used as input to our model. The elbow test depicted in Figure 1 was used to predict the number of categories best suited for our data. Though the data is substantially limited in number, thus in such cases, the values near the predicted number of categories resulted after the elbow test provide similar information, thus necessitating the need for visual inspection. So, for the purpose of visual inspection, t-Distributed Stochastic Neighbour Embedding (t-SNE), which is a method for visualisation of high-dimensional datasets by reducing the dimensions well suited for low dimensional datasets, was used. It is a similar method like principal component analysis(PCA) but contrary to the fact that PCA is a mathematical technique whereas t-SNE being a probabilistic one. The principle followed by t-SNE is to minimise the divergence between the distribution that measures pairwise similarities of the input objects and that of which measures the pairwise similarities of the respective low dimensional data points. A silhouette score was used to interpret the consistency within clusters.