Cosine similarity – Knowledge and References

Explore chapters and articles related to this topic

BERT- and FastText-Based Research Paper Recommender System

Published in Pallavi Vijay Chavan, Parikshit N Mahalle, Ramchandra Mangrulkar, Idongesit Williams, Data Science, 2022

Nemil Shah, Yash Goda, Naitik Rathod, Vatsal Khandor, Pankaj Kulkarni, Ramchandra Mangrulkar

Stop words are excluded from candidate keywords to save computational resources. The candidates can be n-grams where n can be assigned a high value but as the aim is to find only the candidate keywords and not the keyphrases, the feature space is restricted to 1-grams, 2-grams, and 3-grams. As the neural networks are unable to process textual data directly, these keywords are then assigned a unique embedding according to BERT embeddings. A unique embedding is generated for the whole abstract. Then the cosine similarity of the candidates and the abstract embeddings is calculated. Cosine similarity calculates the similarity between two vectors. The cosine similarity is calculated based on the dot product of the vectors The cosine of the angle between two vectors is calculated which determines if the two vectors point in the same direction. Cosine similarity is commonly used to find the similarity of various documents. The top ten keywords having the highest similarity are then chosen. However, the keywords may be synonyms of each other and might not represent the abstract adequately. To solve this issue, the cosine similarity of each chosen keyword and other chosen candidates is calculated. If the similarity is above a chosen threshold, the candidate keyword is eliminated. This ensures that the candidate keywords are diverse. These keywords are then added to the rest of the data to build the recommendation system.

Natural Language Processing (NLP) Methods for Cognitive IoT Systems

View Chapter

Purchase Book

Published in Pethuru Raj, Anupama C. Raman, Harihara Subramanian, Cognitive Internet of Things, 2022

Pethuru Raj, Anupama C. Raman, Harihara Subramanian

One of the simple and easily usable metrics is edit distance (also known as Levenshtein distance). Levenshtein distance is an algorithm for estimating the (cosine) similarity of two string values (word, words form, words composition) by comparing the minimum number of operations to convert one value into another. Below are the popular NLP applications for edit distanceautomatic spell-checking (correction) systems;in bioinformatics – for quantifying the similarity of DNA sequences (letters view);text-processing – define all the proximity of words that are near to some text objects.Cosine similarity is a metric used for text similarity measuring in various documents. Calculations for this metric is based on the measures of the vector’s similarity by the well-known cosine vectors formula:

Building simulation for practical operational optimization

View Chapter

Purchase Book

Published in Jan L.M. Hensen, Roberto Lamberts, Building Performance Simulation for Design and Operation, 2019

David E. Claridge, Mitchell T. Paulus

For fault diagnosis, Lin and Claridge (2017) investigated the use of two similarity measure methods – namely cosine similarity and Euclidean distance similarity to isolate the cause(s) of identified abnormal energy consumption. These methods require only building energy consumption and weather data as inputs. Cosine similarity measures the similarity between two vectors based on the cosine value of the angle between them. Euclidean distance similarity evaluates the similarity between two vectors based on the distance between them. Both can quantify the level of similarity between the observed fault pattern and different reference fault patterns. The most similar fault pattern in the reference database is identified as the cause of the observed fault. When applied to the Sbisa data, cosine similarity identified “outside airflow temperature precooling decrease” as the most probable of 12 possible causes, while Euclidean distance similarity indicated that “outside airflow temperature precooling decrease” was slightly less probable as the cause than “outside airflow ratio increase.” When applied to data from another campus building where a preheat valve had been detected to be leaking, both methods correctly identified the leaking preheat valve as the most probable cause of the fault.

Research on Recommendation System of Online Chinese Learning Resources Based on Multiple Collaborative Filtering Algorithms (RSOCLR)

View Article

Journal Information

Published in International Journal of Human–Computer Interaction, 2023

Tianyu Wang, Dong Ge

Methods for calculating similarity in previous studies include reference cosine similarity, Pearson correlation coefficient, JACARD similarity coefficient, and so on, or by calculating Euclid distance, Manhattan distance, Minkowski distance, and so on (Miao & Sun, 2003). The common algorithms listed above can be used to determine similarity. Because the cosine similarity value range is [−1, 1], the closer it is to 1, the greater the similarity. We chose this algorithm because the normalization of the cosine algorithm and the graphical representation of geometric angle are conducive to better display in the chart, and research has shown that this algorithm can not only help learners and educators find useful learning resources, but it can also help them learn (Recker et al., 2003).

Document similarity for error prediction

View Article

Journal Information

Published in Journal of Information and Telecommunication, 2021

Péter Marjai, Péter Lehotay-Kéry, Attila Kiss

Cosine similarity is a similarity measure between two cardinal vectors of an inner product space. It is equal to the inner product of the vectors normalized to both have length 1, which is the cosine of the angle between the vectors. The positive space, where the result is between is where the cosine similarity is especially used. If the unit vectors are parallel, they are ultimately similar and dissimilar if they are orthogonal. This is correspondent to the cosine which is maximum when the components span a zero angle and null when they are perpendicular. Given the vectors A and B, the cosine similarity, , is represented using a dot product and magnitude as, where and are segments of the vectors A and B.

RFR-DLVT: a hybrid method for real-time face recognition using deep learning and visual tracking

View Article

Journal Information

Published in Enterprise Information Systems, 2020

Zhenfeng Lei, Xiaoying Zhang, Shuangyuan Yang, Zihan Ren, Olusegun F. Akindipe

In this paper, the cosine similarity is used as a measure to match the face. After the face image to be recognised is processed by lightened CNN, we can obtained a 512-dimensional feature vector . The identity is recognised by comparing the cosine distance between the feature vector of the face identified and the feature vector of each face in the test set . The cosine similarity ranged [−1,1] is determined by measuring the cosine of the angle between the two vectors. The closer the cosine distance of two vectors is to 1, the more similar the two vectors are. Let two vectors be . This paper set a threshold to distinguish which faces belong to the same person, i.e. if the cosine similarity of two faces exceeds this threshold, they are considered as the faces of the same person.