Chinese restaurant process

The Chinese restaurant process is a probabilistic model used to describe the behavior of guests entering a restaurant. Each guest is represented as an observation and is assigned to a table that is either already occupied or empty. The probability of a guest choosing a particular table is proportional to the number of guests already seated at that table. The weighted Chinese restaurant process is an improved version of this model that uses a Bayesian approach. The distribution for table choice in both models is the same.From: Machine Learning and Knowledge Discovery for Engineering Systems Health Management [2016], Gene Expression Data Clustering Using Variance-based Harmony Search Algorithm [2019], Handbook of Mixed Membership Models and Their Applications [2019]

Real-Time Identification of Performance Problems in Large Distributed Systems

View Chapter

Purchase Book

Published in Ashok N. Srivastava, Jiawei Han, Machine Learning and Knowledge Discovery for Engineering Systems Health Management, 2016

Moises Goldszmidt, Dawn Woodard, Peter Bodik

This is called the “Chinese restaurant process”; each observation i is conceptually a guest who, upon entering a restaurant, either sits at a table that is already occupied, with probability proportional to the number of guests at that table or sits at an empty table.

Gene Expression Data Clustering Using Variance-based Harmony Search Algorithm

View Article

Journal Information

Published in IETE Journal of Research, 2019

Vijay Kumar, Dinesh Kumar

The literature reports many algorithms for clustering of gene expression data. The K-Means is one such well-known clustering technique used for the same. However, it suffers from local optima and parameter initialization [2]. The fuzzy modifications in K-means are known as Fuzzy C-Means (FCM). In FCM, the cluster membership degree is assigned to genes [11]. For each gene, the membership value of a set of clusters is proportional to its similarity to cluster mean value. Partitioning around medoids (PAM) is another widely used clustering algorithm. It computes medoids for each cluster instead of means. But PAM is unable to handle high-dimensional data-sets and is prone to parameter initialization problem. Due to these issues, it is least used in gene expression data-set [12]. SOM is another widely used technique for clustering the gene expression data-set. Tamayo et al. [4] used SOM to identify clusters in gene expression data-sets. Compared to K-Means and FCM, SOM is more robust in noisy data. However, it is difficult to find clustering boundaries and balanced solutions in SOM [13]. Eisen et al. [14] identified groups of co-regulated yeast genes by applying a variant of hierarchical average-link clustering algorithm. Lengfelder et al. [15] proposed an algorithm which is a combination of all good aspects of hierarchical clustering and PAM. They applied this algorithm on simulated and gene expression data-sets. It performs better than PAM and hierarchical clustering techniques. However, it suffers from parameter tuning problem. Liang and Wang [16] developed a dynamic agglomerative clustering technique for gene expression data-sets. Grouping of the clusters is done dynamically and all the scattered genes are accumulated in a cluster. However, this approach is not robust towards noise and possesses high computational complexity [2]. Qin [17] proposed an improved model based Bayesian approach known as weighted Chinese restaurant process (CRC). It is used to cluster microarray gene expression data. The iterative weighted Chinese restaurant setting scheme is used to find the optimal number of clusters.

Chinese restaurant process

Explore chapters and articles related to this topic

Real-Time Identification of Performance Problems in Large Distributed Systems

Gene Expression Data Clustering Using Variance-based Harmony Search Algorithm