Explore chapters and articles related to this topic
Algorithms and Data Structures for Exact Computation of Marginals
Published in Marloes Maathuis, Mathias Drton, Steffen Lauritzen, Martin Wainwright, Handbook of Graphical Models, 2018
The graph must be triangulated for this to be true — a maximum spanning tree of a cluster graph where the clusters are maximal cliques but the graph is not triangulated will not produce a junction tree. This is a very standard and practical way to get a junction tree in inference code since the elimination algorithm, as mentioned above, produces the set of maximal cliques, and there are a number of simple but fast greedy strategies to form a maximal spanning tree, including Kruskal’s or Prim’s MST algorithm [17]. Prim’s algorithm in fact can be made to run in O(|m′|+|n′|log|n′|) $ O(|m^{\prime }| + |n^{\prime }|\log |n^{\prime }|) $ using a Fibonacci heap, where n′ $ n^{\prime } $ (resp. m′ $ m^{\prime } $ ) is the number of nodes (resp. edges) in the graph of cliques.
Document Clustering: The Next Frontier
Published in Charu C. Aggarwal, Chandan K. Reddy, Data Clustering, 2018
David C. Anastasiu, Andrea Tagarelli, George Karypis
Scatter/Gather [82,45] was an early cluster-based document browsing method that addressed the speed requirement by performing postretrieval clustering on top-ranked documents returned from a traditional information retrieval system. Zamir and Etzioni introduced the well-known Suffix Tree Clustering (STC) [118] algorithm, which creates interesting subtopic clusters based on phrases shared between documents. It follows the assumption that repeated phrases imply topics of interest within the result collection. STC treats a snippet as a string of words, builds a suffix tree over the collection of snippets, and traverses the suffix tree to extract base clusters. The algorithm then uses a binary similarity measure based on overlap of documents to create a base cluster graph. In this graph, each node corresponds to a group of snippets sharing a phrase. The final clustering solution is obtained by finding the connected components in the graph. Zamir and Etzioni also showed that using snippets for clustering is as effective as using whole documents.
Issuecrawling
Published in Celia Lury, Rachel Fensham, Alexandra Heller-Nicholas, Sybille Lammes, Angela Last, Mike Michael, Emma Uprichard, Routledge Handbook of Interdisciplinary Research Methods, 2018
Each of the networks is visualized as a cluster graph (according to measures of inlink centrality), and the findings are described. First, are there other (heretofore) undiscovered groups found through the link analysis? Co-link mapping is a procedure that discovers related URLs through interlinking. In the event, we found Facebook to be a large node in many countries, which not only is in keeping with the impression of groups ‘on the move’ to social media but also prompts the question of its (separate) analysis, for Facebook cannot be crawled as above. (Only links to Facebook are on the map, not outlinks from Facebook.) Second, which sites are responsive and fresh? Are they mainly the populist ones? Indeed, the old guard’s web in a variety of European countries is often stale. It also might be of interest to inquire into where the websites are registered and by whom. Are they registered under aliases and hosted outside the country? Or are they registered in country, under one’s own names? In certain countries, these are signs that groups are in hiding or operating in plain sight, so to speak. In Germany, the groups often mask themselves, while in Austria they tend to operate out in the open.
The driving and dependence power between Lean leadership competencies: an integrated ISM/fuzzy MICMAC approach
Published in Production Planning & Control, 2023
Débora Bianco, Moacir Godinho Filho, Lauro Osiro, Gilberto Miller Devós Ganga, Guilherme Luz Tortorella
An analysis of the 69 papers in Appendix A made it possible to draw up an initial list of Lean leadership competencies. From the literature, 26 competencies were found (Figure 2). To refine such a list we performed a cluster analysis according to word similarity (NVivo software). The excerpts of the 69 SLR articles were coded and linked to the competencies of the Lean leaders. These excerpts were analyzed using the NVivo software, which generated a cluster graph of the similarity of the words. A cluster analysis via word similarity was carried out to avoid any redundancy in the variables, which can hinder the empirical research. The comparison between two similar concepts could generate inconsistency in the final ISM model. The results of the cluster analysis evidenced the high similarity between some of the competencies. Therefore, competencies with high similarity were grouped (which are circled in Figure 3). In addition, the four Lean experts confirmed the existence of similarity between the concepts and approved the combination of the variables. Table 7 presents the final list of 18 competencies used in our study.
Seam carving-based Arabic handwritten sub-word segmentation
Published in Cogent Engineering, 2020
Lamia Berriche, Abeer Al-Mutairy
In (Khan et al., 2014), the authors proposed a method for segmenting Arabic handwritten text into constituent sub-words. The concept is based around the global binarization of an image at various thresholding levels. When each sub-word within the image being investigated is processed at multiple threshold levels, a cluster graph is obtained where each cluster represents the individual sub-words of that word. Once the clusters are obtained, the task of segmentation is managed by simply selecting the respective cluster automatically. It was tested on 537 images from the AHTID/MW database and achieved an accuracy of 95.3%. The novelties of the presented method are that (i) it does not depend on the skew of the document or the skew of the line, and (ii) it is very robust in a way that the proposed method is not affected by distortion. The main problem in the proposed method is that the diacritics are ignored, which affects the extraction of some PAWs whose size is similar to the diacritics’.
Text mining analysis on students’ expectations and anxieties towards data analytics course
Published in Cogent Engineering, 2022
Rex Bringula, SAIDA Ulfa, John Paul P. Miranda, Francis Arlando L Atienza
The Elbow method of the factoextra R library was used to determine the optimal number of clusters. The factoextra library is generally used to extract and visualize the results of multivariate analysis (The R Foundation, 2022). The clusters were generated through the function hclust with ward.D2 as the method. The method employs squared Euclidean distance. The aforementioned function can be found in the stats R library. The cluster graph was created using the function plot function in the graphics R library.