Explore chapters and articles related to this topic
Data Mining – Unsupervised Learning
Published in Rakesh M. Verma, David J. Marchette, Cybersecurity Analytics, 2019
Rakesh M. Verma, David J. Marchette
There are two different approaches to measuring the quality of a clustering: extrinsic versus intrinsic. In the extrinsic approach there is external information available, perhaps in the form of a labeling of the data items into classes. In this case, we can compute the purity of a cluster c by taking the percentage of items in c that belong to the most frequent class of c, which is considered as the label of the cluster. We can then calculate the purity of the clustering as the average purity over all the clusters. Besides purity, we can use maximum matching, F-measure, conditional entropy, normalized mutual information, and variation of information. The latter three measures are referred to as entropy-based measures. There are also correlation-based measures: discretized Huber statistic [204] and its normalized version.
Clustering Validation Measures
Published in Charu C. Aggarwal, Chandan K. Reddy, Data Clustering, 2018
The Mutual Information (MI) and Variation of Information (VI) were developed in the field of information theory [8]. MI measures how much information one random variable can tell about another one [55]. VI measures the amount of information that is lost or gained in changing from the class set to the cluster set [42].
Residual Domain-Rich Models and their Application in Distinguishing Photo-Realistic and Photographic Images
Published in Mangey Ram, Recent Advances in Mathematics for Engineering, 2020
Prakhar Pradhan, Vinay Verma, Sharad Joshi, Mohit Lamba, Nitin Khanna
In this section, we will discuss the residual domain and why feature extraction from the residual domain is preferred. The residual domain of an image denotes the output image filtered through a high-pass filter. Processing an image using a high-pass filter is equivalent to highlighting the details of an image as well as suppressing the smooth regions such as the blue sky/flat regions lacking any texture. Feature extracted from the residual domain is beneficial as the extracted features tend to be independent of the content for the classification tasks. The decision of a classifier should not be biased by the image content, and hence, features obtained from the residual domain are robust and more generalized. Images captured from the most of the cameras have certain dependencies in the neighboring pixels due to the natural scene complexity as well as various digital signal processing operations such as color filter array demo-saicing, gamma correction, and filtering on the irradiance values [2]. These spatial dependencies between the neighboring pixels are graphically shown in [3]. Authors in [3] used 10,700 grayscale images from the BOWS2 dataset [3] to show that the joint probability of occurrences of two adjacent pixels follows a near-linear profile. With this fact, we can deduce that for natural images, the joint probability distribution of adjacent pixels will not vary much, or in other words, the pixels differences of neighboring pixels will be smaller for natural or uncorrupted images. It was further shown in [3] that the shape of joint probability distribution remains unchanged with the pixel value variation using information-theoretic tools like entropy, mutual information, etc. Mathematically, the joint probability P(Ii,j,Ii,j+1) between two adjacent pixels is measured, where Ii,j and Ii,j+1 are the two pixel values adjacent to each other in the horizontal direction. It can be deduced that pixel values Ii,j and Ii,j+1 should be somewhere close to each other for uncorrupted images. Histograms of a double, triple, and large group of neighboring pixels can be used to model the dependencies among pixels in the natural images. But this method of modeling dependencies is less efficient due to the following reasons [3]. Consider the case of an Eight-bit grayscale image in which pixel values lie between 0 and 255, in the joint histogram of two neighboring pixels, there could be 2562 = 65,536 bins.There are such color combinations whose probability of occurrence together is very less, for example, a pixel value of 255 and 0 adjacent to each other in an eight-bit grayscale image. The corresponding bins in the histogram will be empty and thus act as a noise in the features.The features obtained using the histogram are image content dependent.
A new fusion framework for motion segmentation in dynamic scenes
Published in International Journal of Image and Data Fusion, 2021
Additionally, to provide a qualitative comparison of the performance of the proposed method versus another set of methods, we present an example of an experiment in Figure 6. In this experiment, our model is compared against the layered dynamic textures (LDT) (Chan and Vasconcelos 2009), the supervised and unsupervised (based learning metric) approaches presented in (Teney et al. 2015) and the dynamic texture model (DTM) (Chan and Vasconcelos 2008). The result of the presented method, as illustrated in the sixth column, is significantly better as compared to other methods. In Figure 7 we present additional segmentation results obtained from the SynthDB dataset based on our suggested method. Results on the complete dataset are available publicly online in the website of the corresponding author at the following http address: http://www-etud.iro.umontreal.ca/khelifil/ResearchMaterial/consensus-video-seg.html. Besides the GCE criterion, We have also tested the effects of using different fusion criteria. Thus, in Table 2, we report the performances yielded by our algorithm based on the following criteria: • Probabilistic Rand Index (PRI), in which agreements and disagreements are weighted based on the probability of their occurring by chance (Carpineto and Romano 2012).• Variation of information (VoI), in which the information shared between two partitions is measured, in terms of the amount of information that is lost or gained in changing from one clustering to another (Khelifi and Mignotte 2017a).• F-measure which is based on the combination of two complementary measures, namely precision (P) and recall (R) (Khelifi and Mignotte 2017a).