K Nearest Neighbor

Big data in radiation oncology: Opportunities and challenges

Jun Deng, Lei Xing in Big Data in Radiation Oncology, 2019

Several machine learning algorithms have been used in oncology: Decision trees (DTs),63 where a simple algorithm creates mutually exclusive classes by answering questions in a predefined order.Naïve Bayes (NB) classifiers,64,65 which output probabilistic dependencies among variables.k-Nearest neighbors (k-NN),66 where a feature is classified according to its closest neighbor in the data set, is used for classification and regression.Support Vector Machine (SVM),67 where a trained model will classify new data into categories.Artificial neural networks (ANNs),68 where models inspired by biological neural networks are used to approximate functions.Deep learning (DL),69 a variant of ANNs, where multiple layers of neurons are used.

View Chapter

Purchase Book

Machine Learning Algorithms Used in Medical Field with a Case Study

K. Gayathri Devi, Kishore Balasubramanian, Le Anh Ngoc in Machine Learning and Deep Learning Techniques for Medical Science, 2022

A supervised machine learning algorithm, K-Nearest Neighbor which presume related things are present in close proximity. Similar choice of this algorithm is to calculate the Euclidean distance between points. For a chosen number of neighbors, K is initialized as soon as training data is loaded. Then, the distance between the current sample and the query is calculated for each sample in the data. Then to an ordered collection, the index and the distance of the sample are added and sorted based on the distance values. The first K records in the sorted group are picked along with their labels. For classification/regression problems, mode/mean of K labels is returned. When the value of k is 1, the data point is categorized into a group that has one adjoining neighbor alone. Several values of k are tried for classification or regression, and the best one that fits is chosen as the right K. It's simple to realize but becomes slow significantly as the data grows in size

View Chapter

Purchase Book

Basic Approaches of Artificial Intelligence and Machine Learning in Thermal Image Processing

U. Snekhalatha, K. Palani Thanaraj, Kurt Ammer in Artificial Intelligence-Based Infrared Thermal Image Processing and Its Applications, 2023

k-nearest neighbors are commonly known as k-NN and are one of the simplest algorithms which is easy to apply. It is a supervised machine learning algorithm that can be used to solve both regressions as well as classification-related problems (Erickson et al., 2017). The k-NN algorithm uses data points to classify new data points based on the similarity of the points. k-NN algorithm is a data categorization approach that estimates the probability that a data point will become a member of one group or another depending on which group the data points that are closest to it belong to. The k-NN method is a supervised machine learning technique that is used to address classification and regression issues. However, its primary use is in categorization difficulties. k-NN uses a voting process to decide the category of an unseen observation. This indicates that the class with the most number of votes will be assigned to the data point in question.

View Chapter

Purchase Book

Phylogenetic analyses of 41 Y-STRs and machine learning-based haplogroup prediction in the Qingdao Han population from Shandong province, Eastern China

Published in Annals of Human Biology, 2023

Guang-Yao Fan, De-Zhi Jiang, Yao-Heng Jiang, Wei Song, Ying-Yun He, Nixon Austin Wuo

The k-nearest neighbour (kNN) is a non-parametric supervised learning method which is helpful for both regression and classification (Altman 1992). Many prior studies mentioned its potential for allocating each haplogroup based on the Y-STR haplotype (Song et al. 2019b; Yin et al. 2022). Its availability had already been validated by a recent study (Fan 2022). In order to further enhance the predictive performance of the kNN model, a substantial training dataset was adopted for analysis using the “knn” package under the statistical environment R (Zhang 2016). The developed kNN predictor includes 23 common Y-STR loci and corresponding Y haplogroups from 3,248 Han males (Lang et al. 2019; Song et al. 2019a; Yin et al. 2020, 2022; Zhang et al. 2020). The algorithms were implemented using the R script available on GitHub (https://github.com/fanyoyo1983/knn-Y-haplogroup.git). Meanwhile, multi-copy loci and copy number variation (CNV) were excluded in the ML. The specificity and sensitivity of the kNN predictor for each predicted haplogroup were measured and performance was also shown in a confusion matrix.

View Article

Journal Information

Themis-ml: A Fairness-Aware Machine Learning Interface for End-To-End Discrimination Discovery and Mitigation

Published in Journal of Technology in Human Services, 2018

Niels Bantilan

Zliobaite (2015) also describes consistency and situation test score as individual-level discrimination measures. Consistency measures the difference between the target label of a particular observation and target labels of its neighbors. K-nearest neighbors (knn) measures the pairwise distance between observations X. Then, for each observation xi and each neighbor (xj, yj) ∈ knn(xi), we compute the differences between yi and target labels of neighbor yj. A consistency score of 0 indicates that there is no individual-level discrimination, and a score of 1 indicates that there is maximum discrimination in the dataset.

View Article

Journal Information

Predicting changes in substance use following psychedelic experiences: natural language processing of psychedelic session narratives

Published in The American Journal of Drug and Alcohol Abuse, 2021

David J. Cox, Albert Garcia-Romeu, Matthew W. Johnson

All three algorithms led to similar prediction accuracies. Naïve Bayes Bernoulli led to the highest prediction accuracy (68%) compared to the k-nearest neighbors (63%) and the random forest (63%) algorithms using LSA-Alcohol as input. However, prediction accuracies were similar for the k-nearest neighbors, naïve Bayes Bernoulli, and random forest classifiers using LSA-All (50%, 51%, and 48%, respectively) and LSA-Scrubbed (48%, 51%, and 48%, respectively) as input. Each algorithm involves differing degrees of complexity and computational resources. This study indicates the least complex and resource-intensive algorithm (i.e., k-nearest neighbors) led to similar prediction accuracies as the other algorithms. Future research will have to determine the conditions under which more complex and resource-intensive algorithms outperform the k-nearest neighbors algorithm such that the tradeoff is worthwhile.

View Article

Journal Information

Related Knowledge Centers

Anomaly Detection
NONparametric Statistics
Time Series
Normalization
Feature
Variable Kernel Density Estimation
Kernel
Nearest Neighbor Search
Local Outlier Factor
Likelihood-Ratio Test

I want to publish

Big data in radiation oncology: Opportunities and challenges

Machine Learning Algorithms Used in Medical Field with a Case Study

Basic Approaches of Artificial Intelligence and Machine Learning in Thermal Image Processing

Phylogenetic analyses of 41 Y-STRs and machine learning-based haplogroup prediction in the Qingdao Han population from Shandong province, Eastern China

Themis-ml: A Fairness-Aware Machine Learning Interface for End-To-End Discrimination Discovery and Mitigation

Predicting changes in substance use following psychedelic experiences: natural language processing of psychedelic session narratives

Related Knowledge Centers

Current Research