Explore chapters and articles related to this topic
Analyzing the Relationship between Human Development and Competitiveness Using DEA and Cluster Analysis
Published in Sarah Ben Amor, Adiel Teixeira de Almeida, João Luís de Miranda, Emel Aktas, Advanced Studies in Multi-Criteria Decision Making, 2019
The features DEA-1-HDItoGCI score, DEA-2-GCItoHDI score, HDI dimensions (Life expectancy index, Education index, Income index), and GCI subindexes (Basic requirements, Efficiency enhancers, Innovation and sophistication factors) are incorporated to the cluster analysis. Initially, hierarchical clustering was performed for the year 2010 by using Ward’s method with squared Euclidean distance. The dendrogram shown in Figure 11.2 was obtained. The cut was made for four clusters. Furthermore, for the following years 2011–2017 K-means clustering was run with K=4, where the centroids were fed from the previous cluster analysis results of the previous year to the algorithm. Thus, the K-means clustering was performed in order over the years. Table 11.7 presents the cluster analysis results. Also, Table 11.8 provides information about the cluster features.
Unsupervised Learning
Published in Peter Wlodarczak, Machine Learning and its Applications, 2019
There is no theoretical justification for the selection of the linkage method. The choice should be made based on the domain of the application. Also, as with virtually any machine learning technique, there are variations and other methods exist. In practice, the Ward method is widely applied. Ward’s method uses the increase in the squared error that results when two clusters merge as proximity measure. It uses an objective function approach and an agglomerative hierarchical cluster algorithm. It uses the same objective function that we have seen in equation 6.2 for k-means clustering. In each iteration, the pair of clusters that produce the minimum increase in intracluster variance are merged. Variance is measured by the sum of squared errors. The sum of squares starts out as zero since every data point is in its own cluster. It increases as clusters grow. Ward’s method aims to keep the growth as small as possible. To select a new cluster, every combination must be considered, what makes Ward’s method computationally expensive. However, it still requires significantly less computation than other methods.
Finding Clusters
Published in Wendy L. Martinez, Angel R. Martinez, Jeffrey L. Solka, Exploratory Data Analysis with MATLAB®, 2017
Wendy L. Martinez, Angel R. Martinez, Jeffrey L. Solka
Ward’s method tends to combine clusters that have a small number of observations. It also has a tendency to locate clusters that are of the same size and spherical. Due to the sum of squares criterion, it is sensitive to the presence of outliers in the data set.
Digital Transformation for Agility and Resilience: An Exploratory Study
Published in Journal of Computer Information Systems, 2023
George Mangalaraj, Sridhar Nerur, Rahul Dwivedi
To examine the validity of earnings calls data for our research, we first analyzed the entire corpus to see whether there are discernible patterns in the data. Specifically, we pre-processed the text in the transcripts for each company and then obtained a document-term matrix (rows with words in the vocabulary and columns with each of the earnings call transcripts), using a term-frequency inverse-document frequency (TF-IDF) vectorizer implemented in Python. The document-term matrix was then used to compute cosine similarities between the company’s earnings call transcripts irrespective of the size of the document. These similarities were converted to distances and input to Ward’s method, a popular hierarchical clustering algorithm that produces a dendrogram (see Figure 3). As we can see, there are three distinct groups of retailers dealing with differing product assortments.
Skill-Agnostic analysis of reflection high-energy electron diffraction patterns for Si(111) surface superstructures using machine learning
Published in Science and Technology of Advanced Materials: Methods, 2022
Asako Yoshinari, Yuma Iwasaki, Masato Kotsugi, Shunsuke Sato, Naoka Nagamura
However, the other routines did not work well; for example, four structures were recognized during heating as a single cluster when a single linkage was used. Ward’s method combines clusters together in such a manner that the increase in the square of the distance from the center of gravity due to the combining clusters is minimized. Although computationally expensive, Ward’s method is highly sensitive, and therefore, in our case, it is considered to be the most successful in categorization. From these results, Ward’s method was determined as the most appropriate for hierarchical clustering for RHEED pattern change detection. As shown in Figure 2, the dynamic change in the diffraction patterns due to the structural phase transition driven by the amount of adatom deposition occurs continuously. Therefore, cluster boundary detection is inherently difficult. Although Ward’s method is based on hard clustering, our results suggest that Ward’s method is a useful means to easily understand surface superstructural phase transitions.
Wrong-way driving crash injury analysis on arterial road networks using non-parametric data mining techniques
Published in Journal of Transportation Safety & Security, 2022
Sajidur Rahman Nafis, Priyanka Alluri, Wensong Wu, B. M. Golam Kibria
All the dissimilarity measures from Gower distance are then formed into a matrix to further agglomerate into clusters using linkage. For dissimilarity measurement in between the clusters and grouping the clusters, Ward's minimum variance linkage was used, as it reduces the total within-cluster variance (Murtagh & Legendre, 2014). The Ward’s method explores and locates partitions with small sum of squares while creating clusters. The process runs as follows:Generate a cluster for each point, where every point is in its own cluster and the sum of squares = 0Merge two clusters that result in smallest increase in merging cost or sum of squares.Iteratively merge until all clusters aggregates into one single cluster.