Explore chapters and articles related to this topic
Applications in Genomics
Published in Sylvia Frühwirth-Schnatter, Gilles Celeux, Christian P. Robert, Handbook of Mixture Analysis, 2019
Stéphane Robin, Christophe Ambroise
Biclustering is a technique from two-way data analysis, the aim of which is to find a structure of both rows and columns of a data table. This approach is popular for exploring DNA microarrays since there is often a structure both in samples and genes. Looking for a gene/sample block structure can obviously be achieved in two steps (one step for each dimension) or simultaneously in both dimensions (Ben-Dor et al., 2003). A widespread graphical representation of this approach is the classical heatmap which is a false color image of the data table with reordering of the rows and columns according to some identified latent structure.
Clustering Biological Data
Published in Charu C. Aggarwal, Chandan K. Reddy, Data Clustering, 2018
Chandan K. Reddy, Mohammad Al Hasan, Mohammed J. Zaki
The objective of biclustering is to simultaneously cluster both rows and columns in a given matrix. Biclustering algorithms aim to discover local patterns that cannot be identified by the traditional one-way clustering algorithms. A bicluster can be defined as a subset of genes that are correlated under a subset of biological conditions (or samples). Biclustering has been used in several applications such as clustering microarray data [71], identifying protein interactions [68], and other data mining applications such as collaborative filtering [40] and text mining [18].
Cluster analysis
Published in Catherine Dawson, A–Z of Digital Research Methods, 2019
Techniques and procedures that can be used for cluster analysis include (in alphabetical order): Biclustering: simultaneous clustering of the rows and columns of a data matrix. It can also be referred to as co-clustering, block clustering two-dimensional clustering or two-way clustering. See Dolnicar et al. (2012) for a discussion on this technique.Consensus clustering: a number of clusters from a dataset are examined to find a better fit. See Șenbabaoğlu et al. (2014) for an analysis and critique of this technique.Density-based spatial clustering: to filter out noise and outliers and discover clusters of arbitrary shape. See Li et al. (2015) for an example of this technique used together with mathematical morphology clustering.Fuzzy clustering: clustering data points have the potential to belong to multiple clusters (or more than one cluster). See Grekousis and Hatzichristos (2013)for an example of a study that uses this clustering technique.Graph clustering: this can include between graph (clustering a set of graphs) and within-graph (clustering the nodes/edges of a single graph). See Hussain and Asghar (2018) for a discussion on graph-based clustering methods that are combined with other clustering procedures and techniques.Hierarchical clustering: a hierarchy of clusters is built using either a bottom-up approach (agglomerative) that combines clusters, or a top-down approach (divisive) that splits clusters. It is also known as nesting clustering. See Daie and Li (2016) for an example of a study that uses matrix-based hierarchical clustering.K-means clustering: an iterative partitioning method of cluster analysis where data are clustered based on their similarity (the number of clusters is known and specified within the parameters of the clustering algorithm). See Michalopoulou and Symeonaki (2017) for an example of a study that combines k-means clustering and fuzzy c-means clustering.Model-based clustering: a method in which certain models for clusters are used and best fit between data and the models is determined (the model defines clusters). Malsiner-Walli et al. (2018) discuss this clustering technique in their paper.
Hybrid Cuckoo Search with Clonal Selection for Triclustering Gene Expression Data of Breast Cancer
Published in IETE Journal of Research, 2023
P. Swathypriyadharsini, K. Premalatha
In biological research field, initially the gene expression is analyzed over the different samples and the subsets of genes that are similarly expressing over different samples are grouped together under a cluster. This concept of applying the clustering across two dimensions is the Bi-clustering algorithm proposed first by Cheng and Church. It groups the genes which exceed a threshold value δ where it is predefined one. A time–frequency based full-space algorithm is proposed by Feng et al. clusters the genes based on the correlation between genes on the time course [4]. In addition to the two dimensions genes and samples, the third dimension time point is also considered Zhao et al. by proposing the Tricluster algorithm [5]. It focuses on clustering the genes of real microarray datasets and proposed metrics to assess the triclusters efficiency. Jiang et al. proposed the “gTricluster” algorithm, which mines three-dimensional clusters on gene-sample-time (simply GST) microarray data [6]. It categorizes the genes which are consistent across a subset of samples and time segments. Yin et al. suggested the ts-cluster, a new variant of tricluster that clusters genes based on a time-shifting relationship [7]. The algorithm is also resistant for the noise by considering the relative expression magnitude rather than the absolute value. Hu and Bhatnagar proposed a triclustering algorithm that applies the subspace clustering on the Chip-Seq data. This algorithm extracts tricluster from two independent biclusters such that the standard deviations in each bicluster obey an upper bound and it has maximum overlap between the two biclusters [8].