Single-linkage clustering – Knowledge and References

Explore chapters and articles related to this topic

Clustering Techniques

Published in Harry G. Perros, An Introduction to IoT Analytics, 2021

As we have seen, single-linkage clustering is sensitive to outliers which can act as bridges between clusters forcing them to merge. The clusters may contain a lot of data points, but it only takes a few bridging data points to make them merge. The density-based spatial clustering of applications with noise (DBSCAN) algorithm avoids this problem by ensuring that the density of the bridging data points is the same as that of the clusters. An example of bridging data is shown in Figure 8.14. If we used the hierarchical clustering algorithm, the two clusters would have been merged into a single one. The DBSCAN algorithm will only merge them if the density of the bridging data is the same as that of the clusters. If not, then merging does not take place, and the data is simply considered as outliers.

Finding Clusters

View Chapter

Purchase Book

Published in Wendy L. Martinez, Angel R. Martinez, Jeffrey L. Solka, Exploratory Data Analysis with MATLAB®, 2017

Wendy L. Martinez, Angel R. Martinez, Jeffrey L. Solka

Single linkage clustering suffers from a problem called chaining. This comes about when clusters are not well separated, and snake-like chains can form. Observations at opposite ends of the chain can be very dissimilar, but yet they end up in the same cluster. Another issue with single linkage is that it does not take the cluster structure into account (Everitt, Landau, and Leese, 2001).

Electricity Consumption Behaviors and Clustering of Distribution Grids in Terms of Demand Response

View Article

Journal Information

Published in Electric Power Components and Systems, 2022

Umit Cetinkaya, Ezgi Avci, Ramazan Bayindir

In clustering analyses, firstly, a criterion must be defined to make grouping among data sets, and this criterion is generally set with similarity or distance metric in cluster analyses. Secondly, the clustering technique to be used is decided. Distance metric; the function d (i, j) defines a non-negative function; it means the distance between the observation vectors i and j. Euclidean, Manhattan, and Minkowski distance correlations are used in practice to determine the similarities among observations [72, 73]. As can be seen in the literature review, clustering analysis can be made by using different techniques and methods. Nevertheless, we can fundamentally group these methods as hierarchical and nonhierarchical clustering techniques. Hierarchical clustering methods are algorithms that can easily be applied and obtain successful results for consumer-based or grid level clustering analysis in electrical systems [74, 75]. Hierarchical clustering is based on the core idea of objects being more related to nearby objects than to objects farther away. Hierarchical clustering technique is divided into two types: Agglomerative (bottom-up) and Divisive (top-down) techniques. In agglomerative hierarchical clustering, each data group begins by mapping it from the bottom up to the closest cluster and merges all data groups into larger ones until the specific termination condition occurs. Divisive Hierarchical clustering does this from top to bottom. Basically in hierarchical clustering, the user can determine any number of clusters according to the appropriate condition [45]. In the agglomerative hierarchical clustering algorithm, each calculated data is defined as a cluster and linked the closest clusters at each step. To do this, there are three main methods based on similarity and distance measures: single linkage, complete linkage, average linkage. The single linkage clustering defines the connectedness between two clusters with the minimum method or the nearest neighbor method. In this method, the similarity between two clusters is determined by the nearest distance. Differently, to define clusters, the longest distance is used in the complete linkage, and the average distance is used in the average linkage.