Explore chapters and articles related to this topic
Alternative Clustering Analysis: A Review
Published in Charu C. Aggarwal, Chandan K. Reddy, Data Clustering, 2018
Let C1 = {c1, …, ck} and C2={c′1,…,c′k′} be two clusterings of D. The similarity between C1 and C2 may be measured using a function Sim : CD × CD → [0,1] where higher values indicate higher similarity. For measuring similarity, there are a number of possible measures, including the Rand Index [33], Adjusted Rand Index [21], Jaccard Index [18], Normalized Mutual Information [24], and Adjusted Mutual Information [36]. Measurement of similarity between clusterings is important, since it provides insight for the user into the relationship between them. When managing multiple clusterings, assessment of similarity may allow removal of redundant clusterings, selection of interesting clusterings, or increased understanding about clustering evolution. It is also a key step when exploring the convergence properties of a clustering algorithm or assessing its output compared to an expert generated clustering.
Revealing representative day-types in transport networks using traffic data clustering
Published in Journal of Intelligent Transportation Systems, 2023
Matej Cebecauer, Erik Jenelius, David Gundlegård, Wilco Burghout
Spectral clustering is a graph partitioning algorithm designed for image segmentation (Shi & Malik, 2000), but also widely used in transport network partitioning (Ji & Geroliminis, 2012; Lopez et al., 2017; Saeedmanesh & Geroliminis, 2016). The approach proposed by Shi and Malik (2000) is based on concepts from spectral graph theory and uses eigenvalues and eigenvectors to bipartition the graph. Using the nearest neighborhood graph of dataset X, the problem is equivalent to the minimal cut in the graph. We refer to this variant of spectral clustering as ”” (it can also use a precomputed affinity or similarity matrix). The second variant denoted as ””, used in Cebecauer et al. (2019), uses the similarity matrix calculated as cosine similarity between two network-day vectors and The third variant ””, introduced in Lopez et al. (2017), uses a similarity matrix calculated with adjusted mutual information (AMI). First, for each day vector the k-mean spatio-temporal clustering is performed. The AMI then calculates the mutual information between two days and using the spatio-temporal cluster label information instead of absolute values. Furthermore, the original methodology for has mostly been used for speeds and connected networks (Chiabaut & Faitout, 2021; Krishnakumari et al., 2020; Lopez et al., 2017).
An improved density peaks clustering algorithm by automatic determination of cluster centres
Published in Connection Science, 2022
Hui Du, Yanting Hao, Zhihe Wang
The clustering results were compared with several clustering algorithms, including K-means (Jain, 2010), FCM (Fu, 1998), DBSCAN (Ester et al., 1996), FDP, AP (Frey & Dueck, 2007), DFDP, HFDP, DPSLC, and McDPC. These evaluations are Adjusted Rand Index (ARI) (Vinh et al., 2010), Normalized Mutual Information (NMI) (Strehl & Ghosh, 2002), and Adjusted Mutual Information (AMI) (Vinh et al., 2010). All three evaluation indicators are as close to 1 as possible.
Automated Creation of an Intent Model for Conversational Agents
Published in Applied Artificial Intelligence, 2023
Alberto Benayas, Miguel Angel Sicilia, Marçal Mora-Cantallops
V-Measure (Rosenberg and Hirschberg 2007) balances completeness and homogeneity but is not adjusted to randomness. A random clustering process in which there are as many clusters as samples in the dataset will obtain a score above zero, depending on the number of labels in the dataset. To overcome this problem, Adjusted Mutual Information (AMI) (Vinh, Epps, and Bailey 2010) is a mutual information-based metric that normalizes against chance.