Explore chapters and articles related to this topic
Introduction
Published in Sugato Basu, Ian Davidson, Kiri L. Wagstaff, Constrained Clustering, 2008
Sugato Basu, Ian Davidson, Kiri L. Wagstaff
Abstract CORRELATION CLUSTERING addresses clustering problems in which the algorithm has access to some information regarding whether pairs of items should be grouped together or not. Such problems arise in areas like database consistency and natural language processing. Traditional clustering problems such as k-means assume that there is some type of distance measure (metric) on the data items, and often specify the number of clusters that should be formed. In CORRELATION CLUSTERING, however, the number of clusters to be built need not be specified in advance: it can be an outcome of the objective function. Furthermore, instead of a distance function we are given advice as to which pairs of items are similar. This chapter formalizes the CORRELATION CLUSTERING model and presents several approximation algorithms for various clustering objectives.
Wind speed forecasting using deep learning and preprocessing techniques
Published in International Journal of Green Energy, 2023
Management of input data is another way of improving the performance of the forecasting model. Feature selection and extraction are two ways to reduce the input data dimensionality. Feature selection selects the minimum subset of the input feature set while feature extraction is a mapping of the original set (Liu and Chen 2019a). Selecting the right set of features is extremely important since the selected feature set will be the only source of information for the learning algorithm. The aim is to avoid selecting too many or too few features. If too few features are selected, there is a chance that the set of features is low. If there are too many selected, then the effect of noise present in the data may minimize the information of the data set and might increase computational cost and overfitting (Senthil and Lopez 2015). The feature selection methods used in hybrid wind speed forecasting models are usually based on correlation, clustering, and information.
Multiple profiles sensor-based monitoring and anomaly detection
Published in Journal of Quality Technology, 2018
Chen Zhang, Hao Yan, Seungho Lee, Jianjun Shi
In this subsection we will discuss how to cluster multichannel profiles based on reference samples As mentioned in Section 1.1, sensor profiles can be naturally clustered according to their cross-correlation matrix. Therefore, here we adopt the agglomerative hierarchical correlation clustering method, which begins with treating each sensor as a separate cluster and then successively merging them into larger clusters according to sensor correlations. In each step of hierarchical clustering the algorithm finds the closest pair of clusters and then merges them into a new parent cluster. This is repeated until only one cluster is left after iterations, where is the number of sensors. Here the Pearson’s correlation is used to measure the similarity between different sensors, which can be estimated as for , where . Then the distance (dissimilarity) between different sensors is defined as
Molecular-level exploration of properties of dissolved organic matter in natural and engineered water systems: A critical review of FTICR-MS application
Published in Critical Reviews in Environmental Science and Technology, 2023
Mingqi Ruan, Fengchang Wu, Fuhong Sun, Fanhao Song, Tingting Li, Chen He, Juan Jiang
The statistical techniques including rank correlation, clustering analysis, and data fusion model are necessarily needed to deal with the multiple datasets determined by FTICR-MS and other supplementary methods (Figure 2c) (Sleighter et al., 2010). The ranking correlations between fluorescence and MS parameters of DOM show that the coagulation processes selectively remove DOM with abundant O-bearing functional groups (Lavonen et al., 2015). Moreover, the clustering methods including principal component analysis (PCA) and hierarchical clustering analysis (HCA) are utilized to analyze differences in DOM components based on data dimension reduction and main features summarization (Figure 2c) (B. Zhang et al., 2019). The relationships between groundwater DOM variables and FTICR-MS molecules established by PCA indicate that groundwater DOM in high rainfall areas shows higher molecular weight and aromaticity than DOM in semi-arid areas (McDonough, Rutlidge, et al., 2020). Furthermore, a more advanced method, advanced coupled matrix and tensor factorization (ACMTF) has been developed to explain the DOM datasets with heterogeneous nature (Figure 2c) (Wünsch et al., 2018). ACMTF model provides more intuitive chemical results of DOM molecules by imposing the non-negativity constraints and the degree of molecular formula loadings. Wünsch et al. (2018) used the ACMTF model to establish the connection between molecular formula and fluorescence information of DOM and described the attributes of ACMTF components including Stoke’s shift, O/C, H/C, m/z, and DBE. The outcomes obtained through ACMTF are more specific compared to traditional correlation analysis, which helps in yielding broader insight into DOM composition.