Explore chapters and articles related to this topic
Advance Object Detection and Clustering Techniques Used for Big Data
Published in Ankur Dumka, Alaknanda Ashok, Parag Verma, Poonam Verma, Advanced Digital Image Processing and Its Applications in Big Data, 2020
Ankur Dumka, Alaknanda Ashok, Parag Verma, Poonam Verma
Steps in K-means clustering algorithmDecide upon the number of clusters: K.Select centroids in the datasets. Now consider a random K point from the group.Assign the selected data point to the closest centroid.Repeat this procedure for all the data points in the group, thereby creating the clusters. The model for predicting the output of the unlabeled data is ready.Choosing the optimal value of K:Algorithms such as K-means, K-mediods, etc. depend on parameter K which specifies a number of clusters in the dataset. The value of K depends on shape and scale of distribution of points within a dataset. The value of K is chosen such that there should be balance between maximum compression of data using a single cluster and maximum accuracy by means of assigning data points to its own cluster. There are different methods to select the appropriate value of K, and one is the Elbow method or silhouette method.The Elbow method finds the different values of 'K' by calculating Within-Cluster-Sum of Squared (WCSS) errors and thus select K where WCSS becomes the first to start to diminish. Thus, this approach can be divided into three parts as in first part it predicts the center of cluster by finding the squared error for each point by means of squaring the distance of point from its representation. Then in the second step, the WSS score is the sum of these squared errors for each point. Finally, the Euclidean distance or the Manhattan distance can be used to find different values of K.
Customer Mobile Behavioral Segmentation and Analysis in Telecom Using Machine Learning
Published in Applied Artificial Intelligence, 2022
Eman Hussein Sharaf Addin, Novia Admodisastro, Siti Nur Syahirah Mohd Ashri, Azrina Kamaruddin, Yew Chew Chong
Subsequently, the optimal number of clusters was determined by using the Elbow Method and computing the Within Cluster Sum of Squares (WCSS) score. Cui (2020) mentioned the WCSS variable or score, which computes the difference within every cluster where the lower the WCSS value the more effective clustering. Cui also stated that with the increase of the number of K, the WCSS score will decrease, and K is selected on the decrease point which is viewed as an “elbow” in the curve. The implementation of the Elbow method is shown in Figure 5 code snippet. The results of the Elbow method indicate that the optimal number of K is four as shown in the graph in Figure 6 where the elbow is pointing at the number 4. Hence, the K-means algorithm considered four as the K number.
A hybrid system of data-driven approaches for simulating residential energy demand profiles
Published in Journal of Building Performance Simulation, 2021
Sandhya Patidar, David Paul Jenkins, Andrew Peacock, Peter McCallum
In this section, a K-means clustering approach is presented to organise the large volume of dataset accumulated in the three data portfolios. K-means clustering is a widely applied unsupervised machine learning algorithm that aims to organise dataset into an optimum number of clusters/groups, based on the selected features/characteristics, such that the items within the clusters are coherent to the selected features and are considerably distinct between the different clusters. Figure 4 shows the clustering of dwellings across three portfolios, based on the statistical mean and median used as grouping feature/characteristics. For each of the portfolio, statistical mean and median were estimated for each of the dwellings over the entire period. An optimum number of clusters in each portfolio were identified using the Elbow method. The Elbow method plots the total/sum within the cluster variation (i.e. the sum of squared errors with the clusters) with the number of clusters. An optimum number of clusters is chosen when the change in sum within the cluster does not change significantly with a change in the cluster numbers. Please note a coherent colour scheme is adopted across all the Graphs and Tables presented in the paper: Portfolio 1 (Findhorn, 15 February–28 March 2015) dataset are illustrated in shades of Blue;Portfolio 2 (Fintry – July 2017) dataset are illustrated in shades of Green; andPortfolio 3 (Fintry – November 2017) dataset are illustrated in shades of Oranges.
Data-driven approach to prioritize residential buildings’ retrofits in cold climates using smart thermostat data
Published in Architectural Science Review, 2023
The RC values estimated from the decay curve and the energy balance methods were clustered separately using the K-means algorithm. First, the optimal number of clusters was found by using the Elbow method with respect to the within-cluster sum of squares (WCSS). The goal of the elbow method is to find a lower number of clusters that minimize the WCSS. The Elbow method reported four as the optimal number of clusters for the RC value estimated from each method. Then, the need for retrofit for each cluster was ranked according to the RC range of the houses within each cluster.