Explore chapters and articles related to this topic
Real-Time Identification of Performance Problems in Large Distributed Systems
Published in Ashok N. Srivastava, Jiawei Han, Machine Learning and Knowledge Discovery for Engineering Systems Health Management, 2016
Moises Goldszmidt, Dawn Woodard, Peter Bodik
This is called the “Chinese restaurant process”; each observation i is conceptually a guest who, upon entering a restaurant, either sits at a table that is already occupied, with probability proportional to the number of guests at that table or sits at an empty table.
Gene Expression Data Clustering Using Variance-based Harmony Search Algorithm
Published in IETE Journal of Research, 2019
The literature reports many algorithms for clustering of gene expression data. The K-Means is one such well-known clustering technique used for the same. However, it suffers from local optima and parameter initialization [2]. The fuzzy modifications in K-means are known as Fuzzy C-Means (FCM). In FCM, the cluster membership degree is assigned to genes [11]. For each gene, the membership value of a set of clusters is proportional to its similarity to cluster mean value. Partitioning around medoids (PAM) is another widely used clustering algorithm. It computes medoids for each cluster instead of means. But PAM is unable to handle high-dimensional data-sets and is prone to parameter initialization problem. Due to these issues, it is least used in gene expression data-set [12]. SOM is another widely used technique for clustering the gene expression data-set. Tamayo et al. [4] used SOM to identify clusters in gene expression data-sets. Compared to K-Means and FCM, SOM is more robust in noisy data. However, it is difficult to find clustering boundaries and balanced solutions in SOM [13]. Eisen et al. [14] identified groups of co-regulated yeast genes by applying a variant of hierarchical average-link clustering algorithm. Lengfelder et al. [15] proposed an algorithm which is a combination of all good aspects of hierarchical clustering and PAM. They applied this algorithm on simulated and gene expression data-sets. It performs better than PAM and hierarchical clustering techniques. However, it suffers from parameter tuning problem. Liang and Wang [16] developed a dynamic agglomerative clustering technique for gene expression data-sets. Grouping of the clusters is done dynamically and all the scattered genes are accumulated in a cluster. However, this approach is not robust towards noise and possesses high computational complexity [2]. Qin [17] proposed an improved model based Bayesian approach known as weighted Chinese restaurant process (CRC). It is used to cluster microarray gene expression data. The iterative weighted Chinese restaurant setting scheme is used to find the optimal number of clusters.