DBSCAN – Knowledge and References

Explore chapters and articles related to this topic

Basic Approaches of Artificial Intelligence and Machine Learning in Thermal Image Processing

Published in U. Snekhalatha, K. Palani Thanaraj, Kurt Ammer, Artificial Intelligence-Based Infrared Thermal Image Processing and Its Applications, 2023

U. Snekhalatha, K. Palani Thanaraj, Kurt Ammer

The previously discussed clustering methods are most suitable for well-separated and compact clusters and do not perform efficiently when there is noise and outliers present in the data. Density-based clustering (also known as DBSCAN) is an unsupervised clustering algorithm that determines the clusters of arbitrary shapes and the noise in a spatial database (Ester et al., 1996). The DBSCAN algorithm requires two parameters:Epsilon: It described the neighborhood of each pixel. If the distance between two pixels is equal to or lesser than the epsilon value, then they are considered neighbors. Otherwise, they are not considered neighbors.Minimum Points: Defines the minimum number of neighbors within the epsilon radius. If the dataset is larger, a larger value of minimum points must be chosen.The DBSCAN algorithm first starts by finding all the neighboring pixels within the epsilon radius for each data point, and core points are identified (Sander et al., 1998). If each core point has not already been assigned a cluster, a new cluster is assigned. Each density-connected point is found and ascribed to the same cluster as the core point. The iterations are repeated for each unvisited data point in the image. At the end of the algorithm, the pixels that have not been assigned a cluster are the noise.

Clustering Divide and Conquer

View Chapter

Purchase Book

Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022

Chong Ho Alex Yu

Density-based spatial clustering of applications with noise (DBSCAN), as the name implies, is a clustering algorithm based on the density (concentration) of the data (Ester et al. 1996; Schubert et al. 2017). In 2014, the algorithm won the test of time award at the Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining (ACM SIGKDD) Conference. In DBSCAN, data points are put together by their concentrations so that dense and sparse areas are separated. The procedure is explained as follows: Initially the data are divided into n dimensions.For each data point, DBSCAN forms an n-dimensional shape around that data point, and then counts how many points fall within that shape to form a cluster.The cluster is iteratively expanded by checking the points within the cluster and other data points near the cluster. The process continues until no more points can be assimilated.

Machine Learning for Electron Microscopy

View Chapter

Purchase Book

Published in Alina Bruma, Scanning Transmission Electron Microscopy, 2020

Alex Belianinov

In these cases, DBSCAN and other hierarchical clustering methods may be a better choice. DBSCAN is more apt at working with data that has features clustering in aggregates of different size and shape. The algorithm operates by estimating the density of the feature vectors via an input parameter ε, which represents the distance between feature vectors in the same cluster. Unfortunately, only a single ε can be specified in DBSCAN. The OPTICS algorithm can be a solution for cases when ε is a limitation. An additional feature for DBSCAN users interested in processing large data quantities is the parallelism offered by HDBSCAN. The main problem with the density-based clustering approach is that a small percentage of feature vectors may end up not labeled, complicating cluster structure. In these cases, a tool such as a dendrogram, or a clustering tree, can be helpful to pinpoint similarities among feature vectors and feature vector groups.

Temporal pattern mining of urban traffic volume data: a pairwise hybrid clustering method

View Article

Journal Information

Published in Transportmetrica B: Transport Dynamics, 2023

Iman Taheri Sarteshnizi, Majid Sarvi, Saeed Asadi Bagloee, Neema Nassir

For this aim, DBSCAN, with the help of KNN for hyperparameter selection, is applied before any pairwise clustering. DBSCAN (density-based spatial clustering of applications with noise) is a density-based and non-parametric clustering algorithm capable of detecting outliers in the data (Ester et al. 1996). It can be advantageous to apply DBSCAN-KNN for outlier detection in our methodology from different aspects. First, it works completely based on the distance and reachability of data samples. Theoretically, this aligns with our intention as we tend to eliminate data points considerably far from the clusters. Second, the hyperparameters of DBSCAN are challenging to set without any model implementation; however, with assistance from KNN, we show that it becomes completely straightforward to use DBSCAN within our methodology. Furthermore, it is shown in the literature that DBSCAN outperforms most other outlier detection and clustering algorithms when it comes to data with a low number of dimensions (Keogh and Lin 2005).

A design and implementation of heart disease prediction model using data and ECG signal through hybrid clustering

View Article

Journal Information

Published in Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 2023

Ritesh Sonawane, Hitendra Patil

The suggested heart disease prediction method uses the DBSCAN with the KMC approach for final prediction. The density is computed in DBSCAN (Sharma et al. 2016) based on a point achieved through ‘counting the number of points in a region of specified radius around the point’. The clusters are formed using the densities and points with a specified threshold value. One of the popular methods of clustering is DBSCAN. It is efficient and effective to handle large spatial data. DBSCAN is effective due to its features, like the ability to find clusters with arbitrary shapes. The advanced prediction of the number of clusters is not required, which makes the DBSCAN more realistic. The space and time complexity is improved based on the application and selection of applications. It can easily merge with other similar clusters.

Trip-pair based clustering model for urban mobility of bus passengers in Macao

View Article

Journal Information

Published in Transportmetrica A: Transport Science, 2023

W.K. Ku, K.P. Kou, S.H. Lam, K.I. Wong

The parameters in DBSCAN include the searching distance between two data points (ε) and the minimum number of neighbouring points (MinPts) within a circle of radius ε. Schubert et al. (2017) suggested that ε should be chosen as the smallest possible searching distance based on the application domain. However, to group adjacent trip-pairs into the same region, ε should be set larger than the distance to the diagonal trip-pair, that is, , and smaller than the trip-pair, which does not occur adjacently, that is, 2. Thus, the searching radius ε was assumed as 1.5 in this study. Further, MinPts was set at twice the dataset dimension, which is based on the recommendation by Sander et al. (1998), that is, MinPts = 4, implying that passengers who travelled at least five times at a specified time (four trip-pairs) were considered. In addition, for the distance-computing function, the Euclidean distance was employed to measure the distance between the trip-pairs because it is obvious and perfectly suitable for this case.