Explore chapters and articles related to this topic
Machine Learning
Published in Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane, Big Data and Social Science, 2020
Figure 7.3 shows the clusters that k-means would generate on the data set in the figure. It is obvious that the clusters produced are not the clusters you would want, and that is one drawback of methods such as k-means. Two points that are far away from each other will be put in different clusters even if there are other data points that create a “path” between them. Spectral clustering fixes that problem by clustering data that are connected but not necessarily (what is called) compact or clustered within convex boundaries. Spectral clustering methods work by representing data as a graph (or network), where data points are nodes in the graph and the edges (connections between nodes) represent the similarity between the two data points.
An Introduction to Cluster Analysis
Published in Charu C. Aggarwal, Chandan K. Reddy, Data Clustering, 2018
It should be noted that some models of clustering are more amenable to different data types than others. For example, some models depend only on the distance (or similarity) functions between records. Therefore, as long as an appropriate similarity function can be defined between records, cluster analysis methods can be used effectively. Spectral clustering is one class of methods which can be used with virtually any data type, as long as appropriate similarity functions are defined. The downside is that the the methods scale with the square of the similarity matrix size. Generative models can also be generalized easily to different data types, as long as an appropriate generative model can be defined for each component of the mixture. Some common algorithms for categorical data clustering include CACTUS [38], ROCK [40], STIRR [39], and LIMBO [15]. A discussion of categorical data clustering algorithms is provided in Chapter 12.
Modern machine learning techniques and their applications
Published in Amir Hussain, Mirjana Ivanovic, Electronics, Communications and Networks IV, 2015
Mirjana Ivanović, Miloš Radovanovic
The objective behind spectral clustering methods is to identify different groups of data points by finding local neighborhoods within a graph. There exist several viewpoints on how this objective should be achieved, resulting in different approaches to spectral clustering, and explanations as to why spectral clustering methods work: the graph cut point of view (partitioning the graph so that the edges between different groups have a low weights, and the edges within a group have high weights), the random walk point of view (through a stochastic process which jumps from node to node), and the perturbation theory point of view (examining the behavior of eigenvalues and eigenvectors with the introduction of small changes to the matrix, that is, perturbations) (von Luxburg 2007).
Nyström-based spectral clustering using airborne LiDAR point cloud data for individual tree segmentation
Published in International Journal of Digital Earth, 2021
Yong Pang, Weiwei Wang, Liming Du, Zhongjun Zhang, Xiaojun Liang, Yongning Li, Zuyuan Wang
Based on the spectral graph theory, spectral clustering methods have recently shown attractive advantages in segmentation problems (Luxburg 2007). Unlike the classic -means clustering approach, there are no restrictions on data distribution for spectral clustering methods (Fowlkes et al. 2004). For the iterative two-class cut method (Reitberger et al. 2009), spectral clustering has a similar theoretical foundation but offers better effectiveness in multiclass problems (Luxburg 2007). Heinzel and Huber (2018) introduced spectral clustering in tree segmentation using dense terrestrial laser scanning data with a good performance. The biggest challenge of the spectral clustering method for LiDAR point cloud data is computational complexity. For a data set with points, memory is needed to construct the similarity matrix, and the complexity of its eigenvectors calculation is (Lin and Cohen 2010; Ye et al. 2016). Besides time consuming, the procedure poses the risk of memory overflow when is large for the large-scale dense LiDAR point cloud data.
Factor decomposition and clustering analysis of CO2 emissions from China’s power industry based on Shapley value
Published in Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 2020
Spectral clustering is a kind of clustering method based on graph theory, which uses dimension reduction technology and is more applicable for high-dimensional data. Therefore, this study adopts spectral clustering to establish a clustering model for CO2 emissions analysis of provincial power industry. There are two main steps in the spectral clustering process (Xia et al. 2009). The first step is to construct a network graph, represented as G(V,E). Consider 30 target provinces as a set of vertices of a network graph , each vertex contains six sample data, namely the standardized contribution rate of six factors to CO2 emissions. E is the set of edges, denoted by similarity matrix . The second step is to cut the graph constructed in the first step, making the interior of subgraphs as similar as possible and the distance between subgraphs as far as possible.
An adaptive clustering method detecting the surface defects on linear guide rails
Published in International Journal of Computer Integrated Manufacturing, 2019
Youhang Zhou, Zhuxi Ma, Xuanwei Shi, Kui Zhang
Second, in order to reflect the performance of the proposed method, some defects are separated and some are intersected; all of which can be accurately separated. The method proposed in this paper is compared with traditional clustering methods (K-means clustering, spectral clustering), as shown in Figure 7. Comparing Figure 7(a–c), it is apparent that K-means clustering uses the Euclidean distance between data points to measure their ‘similarity’, but it ignores the potential geometry structure formed by data points. Spectral clustering has better performance than K-means clustering, which is very effective for data with structures that are nonlinear and not intersecting. However, it also cannot distinguish data from different manifolds for the case of intersecting. In contrast, it is clear in Figure 7(c) that the defect data points of different manifold structures and those that are intersecting each other are accurately classified. In addition, our method is essentially a hybrid manifold clustering method. As such, it is necessary to compare with other manifold clustering methods. The following methods were compared: K-means, K-manifolds, and mumCluster. The average clustering accuracy of different methods was displayed in 30 independent randomized experiments, as shown in Table 2.