Hierarchical clustering – Knowledge and References

Explore chapters and articles related to this topic

Clustering Divide and Conquer

Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022

In the previous section the variant of DBSCAN, HDBSCAN, was mentioned in passing. There is a traditional version of hierarchical clustering. Unlike k-means clustering and DBSCAN (which yield distinct groups, in which group membership is mutually exclusive), the hierarchical approach shows a variety of combinations. Specifically, a group of data points can stick together to form a small cluster, and then several smaller clusters can be put together to establish a bigger group. Conversely, a big group can be broken down into smaller clusters, too. The result is easy to interpret because the hierarchy of clusters is presented in a graph named dendrogram. Additionally, unlike k-means, the hierarchical method does not require pre-specifying the number of clusters. There are two types of hierarchical clustering, agglomerative and divisive.

Building Models to Support Augmented Intelligence

View Chapter

Purchase Book

Published in Judith Hurwitz, Henry Morris, Candace Sidner, Daniel Kirsch, Augmented Intelligence, 2019

Judith Hurwitz, Henry Morris, Candace Sidner, Daniel Kirsch

Hierarchical clustering also creates clusters/categories that have a hierarchical structure. In this case, the top of the hierarchy comprises the most general categories, and the subclusters have more specific categories. Hierarchical clusters begin with a number of clusters and proceed to merge them based on the proximity of cluster members to each other. Which of these two clustering techniques are to be chosen depends on the type of result the data scientist wants for the problem at hand. If you are trying to determine the best ads to display to a user of an online site with a general category such as clothing or furniture, then k-means clustering will be chosen. But if the company wants to have more specific categories as well as general ones (i.e., furniture and kitchen tables), then hierarchical clustering will be useful.

Unsupervised Learning

View Chapter

Purchase Book

Published in Dirk P. Kroese, Zdravko I. Botev, Thomas Taimre, Radislav Vaisman, Data Science and Machine Learning, 2019

Dirk P. Kroese, Zdravko I. Botev, Thomas Taimre, Radislav Vaisman

Many different types of hierarchical clustering can be performed, depending on how the distance is defined between two data points and between two clusters. Denote the data set by X={xi,i=1,…,n}. As in Section 4.6, let dist(xi, xj) be the distance between data points xi and xj. The default choice is the Euclidean distance dist(xi, xj) = ‖xi − xj‖.

An Ensemble Clustering Framework Based on Hierarchical Clustering Ensemble Selection and Clusters Clustering

View Article

Journal Information

Published in Cybernetics and Systems, 2023

Wenjun Li, Zikang Wang, Wei Sun, Sara Bahrami

The main idea of clustering is to separate the samples and place them in similar groups. This means that similar samples should be placed in the same group so that the samples of other groups have the maximum difference (Mojarad et al. 2021). In general, clustering algorithms can be divided into two general categories: Partitioning clustering methods and Hierarchical clustering methods (Zhou et al. 2022). The purpose of partitioning methods is to data clustering so that the data within one cluster have the shortest distance from each other and the greatest distance from the data of other clusters. The most popular partitioning clustering methods are K-means, C-means, and Fuzzy C-means (FCM) (Shahidinejad, Ghobaei-Arani, and Masdari 2021; Tao et al. 2020; Huang et al. 2021). Hierarchical methods are a procedure for converting an adjacency matrix to a sequence of nested parts as a dendrogram (Shahidinejad, Ghobaei-Arani, and Masdari 2021; Abualigah et al. 2022). Each level of the dendrogram shows a clustering form of data that provides the ability to optimally determine the number of clusters. Hierarchical clustering methods include two categories: Agglomerative and Divisive (Tao et al. 2020).

Segmenting the target audience for transportation demand management programs: An investigation between mode shift and individual characteristics

View Article

Journal Information

Published in International Journal of Sustainable Transportation, 2023

Meiyu (Melrose) Pan, Alyssa Ryan

Individuals that display similar patterns of responses across a collection of variables are identified using segmentation algorithms. One of the most frequent ways of audience segmentation is clustering. Clustering algorithms reduce within-group disparities while maximizing differences between groups. K-means clustering has been used to divide travelers into different groups with diverse travel behaviors and sensitivity to incentives (Arian et al., 2021; M. Lee et al., 2022; Mendiate et al., 2020). Another clustering algorithm called hierarchical clustering was used to cluster a list of users who are followers of AJ+ (an online news channel) Facebook page based on their gender, age, country, and the URL they shared (An et al., 2017). The benefit of hierarchical clustering is that it produces a tree-based representation containing a nested sequence of clusters without requiring the number of clusters to be input or prior knowledge. As a result, it is suitable for situations in which the number of clusters needs to be adjusted based on user input and clusters need to be found on demand. It is, however, more vulnerable to outliers than the K-Means algorithm, making it less robust when new data is introduced (B, 2020).

Clustering driver behavior using dynamic time warping and hidden Markov model

View Article

Journal Information

Published in Journal of Intelligent Transportation Systems, 2021

Ying Yao, Xiaohua Zhao, Yiping Wu, Yunlong Zhang, Jian Rong

Hierarchical Clustering is a method of cluster analysis which seeks to build a hierarchy of groups. The algorithm is divided into the following steps. First, the distance matrix between every two series is input, and the distance in this study is calculated by DTW. The linkage criterion for clustering in this study is the Ward method (Ward, 1963). With the assumption that there are n sets, and the clustering method is a process which permits n sets’ reduction to n – 1 mutually exclusive sets by considering the union of all possible n(n – 1)/2 pairs and selecting a union having a maximal value for the functional relation or the objective function that reflects the criterion chosen by the investigator. By repeating this process until only one group remains, the complete hierarchical structure and a quantitative estimate of the loss associated with each stage in the grouping can be obtained.