Similarity measure – Knowledge and References

Explore chapters and articles related to this topic

Similarity Principle—The Fundamental Principle of All Sciences

Published in Mark Chang, Artificial Intelligence for Drug Development, Precision Medicine, and Healthcare, 2020

A similarity measure or similarity function is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity measure exists, usually such measures are in some sense the inverse of distance matrices: they take on large values for similar objects and either zero or a negative value for very dissimilar objects. For example, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. In the context of cluster analysis, Frey and Dueck suggest defining a similarity measure ()(x,y)=−‖x−y‖2

Feature Selection

View Chapter

Purchase Book

Published in Jan Žižka, František Dařena, Arnošt Svoboda, Text Mining with Machine Learning, 2019

Jan Žižka, František Dařena, Arnošt Svoboda

Term Contribution was proposed by Liu et al. [179] for selecting relevant terms for clustering. In clustering, similarity of documents is of great importance, so Term Contribution measures how a term contributes to document similarity. When calculating the similarity, the cosine similarity is considered as the similarity measure. The similarity of two documents, di and dj, is calculated as the dot product of the length normalized feature vectors representing the documents: sim(di,dj)=∑tw(t,di)×w(t,dj),

Introduction to Biometry

View Chapter

Purchase Book

Published in Ling Guan, Yifeng He, Sun-Yuan Kung, Multimedia Image and Video Processing, 2012

Carmelo Velardo, Jean-Luc Dugelay, Lionel Daniel, Antitza Dantcheva, Nesli Erdogmus, Neslihan Kose, Rui Min, Xuran Zhao

Classification is the problem that involves the identification of subpopulations in a set of input data. For biometrics it means finding a transformation that leads from the feature space to a class space. The purpose of a biometric authentication system is mainly to retrieve the identity of a person, or to verify that a person is who he/she claims to be. The verification problem is a binary classification (genuine vs. impostor), whereas the identification problem is an n-ary classification, where n ∈ ℕ is the number of mutually exclusive identities. A person, represented as a feature set, is classified by measuring its similarity to the template of each class; the person is then said to belong to the class that has the most similar template(s). Classification task is to minimize the intraclass variations (i.e., the variations which a biometric trait experiences because of natural conditions) and maximizing the interclass variations which occur between different persons. For classification, a similarity measure has to be defined. Such operation measures the distance of a feature projected into the classification space against all the templates.

Ontology-based semantic data interestingness using BERT models

View Article

Journal Information

Published in Connection Science, 2023

Abhilash CB, Kavi Mahesh, Nihar Sanda

The Figure 4 illustrate the SIF-B framework, the data is converted to RDF format using ontology and domain knowledge. Then, association mining algorithm, such as the improved Apriori algorithm, is applied to generate a rule repository from the RDF data and an OCA Mining algorithm is applied to have the relevant and useful rules required by the decision makers. Its where the interesting rules are selected based on predefined constraints. Further, BERT model to generate semantic embeddings. By applying cosine similarity measure, it measures the similarity between two non-zero vectors of an inner product space and henceforth identify the semantic rich rules. This methodology diagram provides a visual representation of the proposed framework for uncovering interesting insights in large COVID-19 datasets. The use of ontologies, the improved Apriori algorithm, and the BERT model for evaluating the interestingness of the rules makes the framework unique and promising for finding meaningful relationships and facts in large datasets.

Feature level fusion framework for multimodal biometric system based on CCA with SVM classifier and cosine similarity measure

View Article

Journal Information

Published in Australian Journal of Electrical and Electronics Engineering, 2023

Chetana Kamlaskar, Aditya Abhyankar

By definition (Tan, Steinbach, and Kumar 2006), the Manhattan and Euclidean distances show the distance between the two vectors while taking their magnitude into account. The cosine similarity measure avoids the constraints of the Euclidean, which is susceptible to outliers; by focusing just on angle similarity and discarding the scaling on magnitude. At the matching stage, to match the fused feature vectors in canonical subspace, the angles between subspaces become a practical measurement compare to the distance measures. Hence, simple matchers are chosen in order to speed up the matching process and analyse how well the CCA based feature fusion algorithm performs for multimodal system. As stated in (7), the best match image in Manhattan distance is found by matching the test vector Zt with the training vector.

Matching heterogeneous ontologies with adaptive evolutionary algorithm

View Article

Journal Information

Published in Connection Science, 2022

Xingsi Xue, Haolin Wang, Xin Zhou, Guojun Mao, Hai Zhu

In general, ontology matching is aimed at finding the entity correspondences between two heterogeneous ontologies. By means of using the similarity measures, an ontology matching system can distinguish the heterogeneous entities and generate the ontology alignment (Xue & Chen, 2021). To be specific, similarity measure can be seen as a function that calculates what extent two entities are similar and outputs a real number from 0 to 1. The frequently-used similarity measures can be categorised into three categories, i.e. the syntax-based, semantic-based and structure-based measures (Rahm & Bernstein, 2001). Since a single similarity measure is not able to ensure the result's confidence, it is usually necessary to combine several measures. How to determine the aggregating weights to obtain the high-quality alignment is the so-called ontology meta-matching problem (Martinez-Gil & Aldana-Montes, 2011), which is a challenging problem in the ontology matching domain.