Explore chapters and articles related to this topic
Clustering
Published in Jan Žižka, František Dařena, Arnošt Svoboda, Text Mining with Machine Learning, 2019
Jan Žižka, František Dařena, Arnošt Svoboda
where |C| is the number of documents in cluster C. A cluster can also be represented by one of the documents contained in it, for example, by its medoid (a document closest to the centroid). All this depends on the applied clustering algorithm. For instance, some algorithms measure the similarity between two clusters in terms of the similarity of the most similar objects (single linkage clustering method) or two most dissimilar objects (complete linkage clustering method). In the k-means algorithm, the objects are moved to the closest cluster which is determined according to the proximity to a cluster centroid. In agglomerative centroid methods, each cluster is represented by a centroid and the similarity is calculated between these centroids [230, 128].
Association Rule Mining in Big Datasets Using Improved Cuckoo Search Algorithm
Published in Cybernetics and Systems, 2023
Data is grouped into clusters or groups by the method of clustering, which aims to make data as similar as possible inside each cluster while minimizing similarity between clusters (Jadhav and Gomathi 2019). The two types of clustering that are typically used are partitional clustering and hierarchical clustering. Complete linkage clustering, single linkage clustering, average linkage clustering, and centroid linkage clustering are all types of hierarchical clustering. K-means and fuzzy K-means are among those used in partitional clustering (Sumathi and Sivanandam 2006). The easiest and most commonly used clustering algorithm is K-Means. Before using the k-means method in this, the k-value must be supplied. Following this specification, the algorithm randomly selects k center points. The data is then assigned to a cluster based on the nearest center point and by calculating the distance between each data point and the randomly selected centers. The clustering is then done using the new center points for each cluster after selecting a new center point for each cluster. Repeating this causes every center to stabilize (Pham and Afify 2007). The benefit of k-means is that it can compute time pretty quickly and efficiently for the largest number of groups. Hence, the k-means clustering model is utilized in our ARM based research work.
A novel approach to text clustering using genetic algorithm based on the nearest neighbour heuristic
Published in International Journal of Computers and Applications, 2022
D. Mustafi, A. Mustafi, G. Sahoo
Apart from the hierarchical and partitional methods, several other clustering techniques like Random Sampling Method [43], Condensation-Based clustering [44], Density-Based Methods [26], Grid-Based Methods [45], Linkage-based methods [46], etc., are often used to group the objects into clusters. In Random Sampling Clustering algorithm [34], a random sample of the original data set is used to create the clusters instead of considering the entire data set. Condensation-based clustering approach like BIRCH uses the notion of a clustering feature to represent a cluster. The Grid-based clustering method takes a hierarchical approach to the clustering problem and clusters by breaking the domain space into the small cells. Density-based clustering generates arbitrary-shaped clusters by implementing the concept of user-specified minpoints and epsilon neighborhood. Connectivity-based clustering such as single-linkage clustering [47], complete-linkage clustering [48], average-linkage clustering [27], UPGMA (Unweighted Pair Group Method with Arithmetic mean) [49] use linkage criterion for the formation of clusters. All these clustering techniques generally assign each data object to one cluster which is referred to as hard or crisp clustering. One of the promising clustering techniques which have gained prominence in recent times is the spectral clustering [50] method which uses eigenvalues to decompose the data set into clusters. Graph-Theoretic Clustering [51] is based on the Minimum Spanning Tree algorithm.
Improved similarity coefficient and clustering algorithm for cell formation in cellular manufacturing systems
Published in Engineering Optimization, 2020
Lang Wu, Li Li, Lijing Tan, Ben Niu, Ran Wang, Yuanyue Feng
Many researchers have developed SCM and clustering algorithms to solve the CF problem from different perspectives such as operation sequence, process routing and production volume. Non-binary PMIM was presented to incorporate these realistic production factors. In addition, clustering algorithms have been developed to overcome the limitations of classical single-linkage clustering (McAuley 1972), average linkage clustering (Seifoddini 1989), complete linkage clustering (Gupta and Seifoddini 1990) and the p-median model (Kusiak 1985), which may result in an improper CF and higher inter-cellular movements.