K-means++ – Knowledge and References

Explore chapters and articles related to this topic

Advance Object Detection and Clustering Techniques Used for Big Data

Published in Ankur Dumka, Alaknanda Ashok, Parag Verma, Poonam Verma, Advanced Digital Image Processing and Its Applications in Big Data, 2020

Ankur Dumka, Alaknanda Ashok, Parag Verma, Poonam Verma

The steps for choosing the value of K using the Elbow method can be summarized as follows:Compute K-means clustering for different values of K by varying K from 1 to 10 clusters.For each K, calculate the total within-cluster sum of square.Plot the curve of WCSS versus the number of clusters K.The location of a bend in the plot indicates an appropriate number of clusters in the datasets.

An analysis of the influence of affordable housing system on price

View Chapter

Purchase Book

Published in Dawei Zheng, Industrial, Mechanical and Manufacturing Science, 2015

Dawei Zheng

On the whole, there are three methods to select initial clustering centers (He et al. 2004): random sampling, distance optimization and density estimation. (Qing & Zheng 2009) improves K-means algorithm by means of dividing sampling. (Tong et al. 2009, Yao & Shi 2010, Cao et al. 2009, Feng 2007 ) use max-min distance means to determine the initial centers. (Chen & Wang 2012, Sun & Liu 2008) find those data objects which are farthest away from each other in the highest density area to be initial centers and effectively reduce the possibility of selecting isolated objects. Besides considering the area density of data objects, (Feng et al. 2012, Li & Wang 2010) get current optimal initial centers through minimum spanning tree. (Zhang & Duan 2013) introduces individual silhouette coefficient, through running K-means algorithm several times, calculating and comparing the silhouette coefficient of final results, thus to get the optimal initial clustering centers and final clustering result.

Flower Pollination Algorithm

View Chapter

Purchase Book

Published in A Vasuki, Nature-Inspired Optimization Algorithms, 2020

A Vasuki

In [4] FPA with bee pollinator is proposed in order to improve the global and local search abilities and prevent FPA from getting trapped in local minima. Three strategies have been included in FPA for improving the local and global search abilities. The discard solution (pollen) operator and crossover operator taken from the artificial bee colony algorithm enhance diversity with global search (improve exploration) whereas the elite-based mutation operator is included to enhance the local search ability (improves exploitation). Honey bees are used to perform Levy flights and do a global search. If a solution is not the global best and it is not improved after a fixed number of iterations it should be discarded (discard pollen operator) and a new solution generated randomly. This helps in coming out of local minima. Crossover increases the diversity of the population. The elite-based mutation operator improves the convergence speed. The application for which the proposed algorithm has been tested is data clustering, and results for several data sets prove the superiority of the FPA with bee pollinator. Clustering is one of the techniques for data analysis, data mining, image classification, and related applications. k-means clustering is one of the most popular techniques commonly used for data clustering and classification. One of the disadvantages of k-means clustering is that it might lead to locally optimal solutions since the final solution depends upon the initial values. The results of FPA with bee pollinator have been compared with DE, ABC, FPA, CS, PSO, and k-means clustering, and it has been found that the hybrid FPA–bee pollinator surpasses the other algorithms in accuracy, convergence speed, and stability.

Object of Interest and Unsupervised Learning-based Framework for an Effective Video Summarization Using Deep Learning

View Article

Journal Information

Published in IETE Journal of Research, 2023

Alok Negi, Krishan Kumar, Parul Saini

This paper suggests a clustering approach for summarization to generate concise and intelligent video abstraction. It will facilitate user access to large volumes of video content effectively and efficiently. It is based on OoI, feature extraction, and reducing the dimensionality of the featured frames. The OoI and ResNet-50-based solution makes producing the pertinent video summary easier and more dependable. The essential limited dimensionality, reconstruction, and recovery of frames according to the number of principal components were modeled using PCA, and unimportant frames which do not have any information were eliminated. After that, the K-means approach with silhouette score is used for clustering. The maximum mean and standard deviation frames are declared candidate frames from each cluster. Further, PCC removes the redundant candidate frames and selects keyframes to generate the video summary. Finally, the experimental results on a benchmark dataset with various views of videos demonstrate that the proposed method outperforms the state–of–the–art models with the best Recall score. We will work on large-length multi-view surveillance videos for various purposes and model training to produce more OoI in the future.

Estimation of IRI from PASER using ANN based on k-means and fuzzy c-means clustering techniques: a case study

View Article

Journal Information

Published in International Journal of Pavement Engineering, 2022

Jalal Barzegaran, Reza Shahni Dezfoulian, Mansour Fakhri

In another approach to identify the optimal number of clusters in k-means clustering, the Elbow method was applied in which a square of the distance between the data points in each cluster and the centroid of the cluster was used. Using the SSE as the performance indicator, smaller values of SSE indicate that each cluster is more convergent. When the number of clusters is set to approach the optimum number of clusters, SSE shows a rapid decline. As the number of clusters exceeds the optimum number of clusters, SSE will continue to decline but it will quickly become slower (Syakur et al. 2018, Saputra et al. 2020). Figure 5 depicts the trend of SSE as the number of clusters increases. It was indicated that the SSE declined dramatically from 10,817 with one cluster to 1758 for four clusters, however, the rate of variances in SSE remained constant afterward. Therefore, the clustering with four centroids was selected for further analysis. The results of clustering with the k-means algorithm for three modes of 4, 5, and 6 centroids were provided in Table 5.

Optimising classification of proximal arm strength impairment in wheelchair rugby: A proof of concept study

View Article

Journal Information

Published in Journal of Sports Sciences, 2021

Barry S. Mason, Viola C. Altmann, Michael J. Hutchinson, Nicola Petrone, Francesco Bettella, Victoria L. Goosey-Tolfrey

Significantly correlated strength measures were entered into a k-means cluster analysis, which partitions data observations into non-overlapping subgroups (clusters) based on their proximity to the mean of each cluster. The squared sum of within-cluster distances to the centroid of each cluster was calculated and plotted against the cluster number. Using the elbow method (Thorndike, 1953), the inflection point in the curve was used to identify the optimal number of clusters entered into the k-means analysis. Cluster membership and the resulting distance from cluster centroids were determined for each case. The mean intra-cluster distance in relation to the mean of the adjacent cluster was then used to calculate the mean silhouette coefficients (Rousseeuw, 1987). Mean silhouette coefficients quantify the overall strength of the class structures identified and were categorized as having “no substantial structure” (≤ 0.25), “weak structure” (0.25 to 0.50), “reasonable structure” (0.51 to 0.70) or “strong structure” (≥ 0.71) (Kaufman & Rousseeuw, 2005).