Distance measure – Knowledge and References

Explore chapters and articles related to this topic

Multi-Layer Electronic Supply Chain: Intelligent Information System

Published in Hamed Fazlollahtabar, Supply Chain Management Models: Forward, Reverse, Uncertain, and Intelligent, 2018

An important component of a clustering algorithm is the distance measure between data points. If the components of the data instance vectors are all in the same physical units then it is possible that the simple Euclidean distance metric is sufficient to successfully group similar data instances. However, even in this case the Euclidean distance can sometimes be misleading. Despite both measurements being taken in the same physical units, an informed decision has to be made as to the relative scaling. Notice however that this is not only a graphic issue. The problem arises from the mathematical formula used to combine the distances between the single components of the data feature vectors into a unique distance measure that can be used for clustering purposes: Different formulas lead to different clustering.

Location Analysis In Transportation

View Chapter

Purchase Book

Published in Dušan Teodorović, The Routledge Handbook of Transportation, 2015

Dušan Teodorović, Branka Dimitrijević, Milica Šelmić

In the literature, most commonly used distance measures are the Manhattan (or rectilinear or rectangular or l1) distance and the Euclidean (or straight line or l2) distance. Love et al. (1988) recommended a weighted lp distance function when solving real-life problems that use approximations of road distances between points purely based on the coordinates of the endpoints. They have shown that the road distances are usually 10–30 percent greater than the corresponding Euclidean distances.

A Study of Distance Metrics in Document Classification

View Chapter

Purchase Book

Published in Sk Md Obaidullah, KC Santosh, Teresa Gonçalves, Nibaran Das, Kaushik Roy, Document Processing Using Machine Learning, 2019

Ankita Dhar, Niladri Sekhar Dash, Kaushik Roy

Distance measures are an important affair in many machine learning tasks. The purpose of the distance measurement is to estimate the similarity or dissimilarity between two terms. The distance measures considered are squared Euclidean distance, Manhattan distance, Mahalanobis distance, Minkowski distance, Chebyshev distance and Canberra distance.

A Hybrid Clustering Method Based on the Several Diverse Basic Clustering and Meta-Clustering Aggregation Technique

View Article

Journal Information

Published in Cybernetics and Systems, 2022

Bing Zhou, Bei Lu, Salman Saeidlou

Basically, a suitable distance measure can be very effective in clustering. However, the appearance of the clusters can be geometric, so this challenge must also be considered. On the other hand, the results of the clustering method should be understandable in order to solve business problems. Therefore, scalability, features, dimensions, appearance, noises, and interpretability are the things that clustering methods should consider to solve the problem (Nasiri et al. 2022; Jadidi and Dizadji 2021). In general, performing clustering using different methods have a similar architecture. This is while the differences among the clustering methods include the distance/similarity criteria, initial cluster values and how to form the final clusters. These differences have led to the development of different clustering methods over time. Basically, there are five main classes of clustering methods including Density-based Clustering (DC), Grid-based Clustering (GC), Model-based Clustering (MC), Hierarchical Clustering (HC), and Partitional Clustering (PC), as shown in Figure 1 (Wei, Li, and Zhang 2018).

A survey of deep learning approaches for WiFi-based indoor positioning

View Article

Journal Information

Published in Journal of Information and Telecommunication, 2022

Xu Feng, Khuong An Nguyen, Zhiyuan Luo

In addition, 21.4 (12 out of 56) of the regression systems adopt the popular machine learning method K-Nearest Neighbours (KNN) to make the final positioning estimation. After generating the new features of a new position based on the WiFi data collected by the user, the KNN algorithm measures the distance from the features of this new position to all training positions. The distance measure used can be Euclidean, Manhattan, Minkowski or Weighted distance. Then the KNN selects the top K training positions closest to the new position and calculates their average as the final position estimation. Weighted K-Nearest Neighbours (WKNN) is the weighted version of KNN and is able to be more robust against variations in distances of the KNN which may lead to wrong decisions. Especially, the weight in WKNN could be the prediction probability of each selected training position to be the exact position where the user is. Algorithms used by the covered systems also include Extended Kalman Filter (EKF), Maximum Likelihood Estimation (MLE), dynamic Markov Decision Process (MDP), and Support Vector Regression (SVR). The main trend here is to calculate the weighted average of the candidate positions generated by deep learning methods.

A novel approach to text clustering using genetic algorithm based on the nearest neighbour heuristic

View Article

Journal Information

Published in International Journal of Computers and Applications, 2022

D. Mustafi, A. Mustafi, G. Sahoo

Various distance measures have been used in clustering algorithms to compute the similarity between individual entities. Entities are usually associated with four different kinds of attributes, i.e. nominal, ordinal, interval and ratio, and different measures have different ways of accounting for these attribute types. However, irrespective of their mathematical formulations most distance measures satisfy the conditions of positivity, i.e. , reflexivity, i.e. , symmetry, i.e. and triangular inequality, i.e. . The similarity between two entities can be derived by measuring the distance between the two objects. One of the most popular distance measurement metric is the Minkowski distance [53] or norm. Mathematically, this is defined as If p = 1, 2, Equation (1) defines Manhattan distance metric [54] and Euclidean distance metric [55], respectively.