K-means clustering – Knowledge and References

Explore chapters and articles related to this topic

Hyperspectral imaging analyses of concrete structures with emphasis on bridges

Published in Joan-Ramon Casas, Dan M. Frangopol, Jose Turmo, Bridge Safety, Maintenance, Management, Life-Cycle, Resilience and Sustainability, 2022

A. Strauss, F. Sattler, M. Granzner, D.M. Frangopol

The data analysis was carried out with k-means clustering. K-means clustering is an unsupervised data classification algorithm that divides the dataset into different groups. First, k cluster centres, called centroids, are randomly assigned and each data point (in this case the individual pixels of the recordings) is assigned to the cluster with the closest mean value (Morton 2019, Ji et al. 2019, Duda & Hart 1973). Afterwards, the centroids are recalculated from the current clusters. Each data point is then assigned to the cluster closest to it, and so on. This is repeated until the value of the centroid does not change any more (Khouj et al. 2018, Ranjan et al. 2017, Schowengerdt 2007). The number of clusters is defined in advance. In some cases it is necessary to adjust the number of clusters and to carry out the calculation again in order to obtain meaningful results.

Unsupervised and Semi-supervised Machine Learning Algorithms for Cognitive IoT Systems

View Chapter

Purchase Book

Published in Pethuru Raj, Anupama C. Raman, Harihara Subramanian, Cognitive Internet of Things, 2022

Pethuru Raj, Anupama C. Raman, Harihara Subramanian

The objective of K-means is simple. It is all about grouping similar data points together and discovers underlying patterns. The K-means algorithm starts with the first group of randomly selected centroids (processing the learning data), and the first group will act as the beginning points for every cluster. Then the algorithm performs iterative (repetitive) calculations and optimizes the positions of the centroids. The algorithm halts this process (creation and optimization of clusters) whenThe centroids have stabilized – there is no change in their values because the clustering has been successful.The process has reached or completed to the defined number of iterations.K-means clustering is easy to understand, faster training, and quick results made this algorithm an extensively used technique and famous for data cluster analysis. However, there are cases where its performance is not as competitive as a few other sophisticated (K medoids, K-means ++) clustering techniques because slight variations in the data could lead to high variance.

Work stress induced chronic insomnia in construction

View Chapter

Purchase Book

Published in Imriyas Kamardeen, Work Stress Induced Chronic Diseases in Construction, 2021

Imriyas Kamardeen

To identify and classify naturally occurring patterns of relationships of insomnia with job stressors, stress, stress coping styles, mental disorders and job performance, clustering was conducted using IBM SPSS 26.0. Three clustering techniques are available in IBM SPSS, namely K-means clustering, hierarchical clustering and TwoStep clustering. K-means clustering is a non-hierarchical clustering technique which can be used to perform clustering only based on continuous variables. Similarly, the hierarchical clustering technique processes only continuous variables. The TwoStep clustering is a combination of non-hierarchical and hierarchical techniques and it can process both categorical and continuous data for clustering. Hence, this study used the TwoStep clustering approach. Accordingly, the cluster analysis was conducted in two episodes, involving two successive steps as follows: TwoStep clustering technique was used to group the cases into n = k clusters by maximising between-cluster differences and minimising within-cluster variance in insomnia and other variables concerned.Statistical tests for the significance of cluster mean differences were applied to verify whether the clusters differed significantly for all variables included in the analysis.

Spatial accessibility analysis and location optimization of emergency shelters in Deyang

View Article

Journal Information

Published in Geomatics, Natural Hazards and Risk, 2023

Zuopei Zhang, Yunfeng Hu, Wei Lu, Wei Cao, Xing Gao

K-means clustering is a classical unsupervised machine learning method. It calculates the distance between class cluster centroids and sample points by minimizing, or no longer varying (MacQueen 1967). The steps are: the data is divided into k groups in advance, then k objects are randomly selected as the initial cluster centers, then the distance between each object and each seed cluster center is calculated, and each object is assigned to the cluster center closest to it. where denotes the i-th object in the data, denotes the j-th cluster center, denotes the t-th attribute of the ith object, and denotes the tth attribute of the jth cluster center. When all sample points are assigned, the centroid of the k class clusters is recalculated, which is the optimal solution with the most reasonable distance of all sample points in each direction within the class clusters, and is calculated as follows: where denotes the i-th cluster center, denotes the number of objects in the l-th cluster, and denotes the i-th object in the l-th cluster.

Refined PSO Clustering for Not Well-Separated Data

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2023

Chilankamol Sunny, Shibu Kumar K. B

Several extensions of K-means have been published over the years improvising specific areas of the basic algorithm. Many attempts have been made to speed up the clustering process like using kd-tree for representing data, updating cluster means with groups of points rather than a single point, a single-pass version of K-means and so on (Jain, 2016). One extension which advocates soft assignment in contrast to the hard assignment strategy of K-means is Fuzzy C-means in which each data item can be a member of multiple clusters with a membership value. Another variant is Bisecting K-means introduced in 2000, which recursively partitions the dataset into two clusters at each stage. X-means (Jain, 2016) automatically finds the number of clusters, K by optimising Bayesian Information Criterion. In 2005, Kaufman & Rousseeuw proposed the K-medoid clustering in which clusters are represented using the median of the data instead of the mean (Kaufman & Rousseeuw, 2005). Several such variants came up focusing on targets like robustness against noise, lower computational cost, accuracy improvements, avoiding the need to specify the value of K, etc. Another noteworthy work is the one which extended the basic K-means clustering which categorises only numeric data to operate on data with mixed numeric and categorical features. Almost all of these extensions introduced some additional parameters that must be specified by the user which is not desirable and the need for a generic clustering algorithm that can produce near-to-ideal partitioning on a random input dataset is yet to be met.

An Improved K-Means Clustering for Segmentation of Pancreatic Tumor from CT Images

View Article

Journal Information

Published in IETE Journal of Research, 2021

R. Reena Roy, G. S. Anandha Mala

K-Means clustering Algorithm (Conventional):Find the number of clusters (assume k)Initialize randomly the k points, centroids (center point of cluster)Allocate objects closest to the centroid based on similarityCalculate the latest centroid (mean) and assign to each clusterAssign each data points to newly calculated centroid position, repeat the steps 3 and 4 until no reassignment.