Big data in radiation oncology: Opportunities and challenges
Jun Deng, Lei Xing in Big Data in Radiation Oncology, 2019
Several machine learning algorithms have been used in oncology: Decision trees (DTs),63 where a simple algorithm creates mutually exclusive classes by answering questions in a predefined order.Naïve Bayes (NB) classifiers,64,65 which output probabilistic dependencies among variables.k-Nearest neighbors (k-NN),66 where a feature is classified according to its closest neighbor in the data set, is used for classification and regression.Support Vector Machine (SVM),67 where a trained model will classify new data into categories.Artificial neural networks (ANNs),68 where models inspired by biological neural networks are used to approximate functions.Deep learning (DL),69 a variant of ANNs, where multiple layers of neurons are used.
Machine Learning Algorithms Used in Medical Field with a Case Study
K. Gayathri Devi, Kishore Balasubramanian, Le Anh Ngoc in Machine Learning and Deep Learning Techniques for Medical Science, 2022
A supervised machine learning algorithm, K-Nearest Neighbor which presume related things are present in close proximity. Similar choice of this algorithm is to calculate the Euclidean distance between points. For a chosen number of neighbors, K is initialized as soon as training data is loaded. Then, the distance between the current sample and the query is calculated for each sample in the data. Then to an ordered collection, the index and the distance of the sample are added and sorted based on the distance values. The first K records in the sorted group are picked along with their labels. For classification/regression problems, mode/mean of K labels is returned. When the value of k is 1, the data point is categorized into a group that has one adjoining neighbor alone. Several values of k are tried for classification or regression, and the best one that fits is chosen as the right K. It's simple to realize but becomes slow significantly as the data grows in size
Basic Approaches of Artificial Intelligence and Machine Learning in Thermal Image Processing
U. Snekhalatha, K. Palani Thanaraj, Kurt Ammer in Artificial Intelligence-Based Infrared Thermal Image Processing and Its Applications, 2023
k-nearest neighbors are commonly known as k-NN and are one of the simplest algorithms which is easy to apply. It is a supervised machine learning algorithm that can be used to solve both regressions as well as classification-related problems (Erickson et al., 2017). The k-NN algorithm uses data points to classify new data points based on the similarity of the points. k-NN algorithm is a data categorization approach that estimates the probability that a data point will become a member of one group or another depending on which group the data points that are closest to it belong to. The k-NN method is a supervised machine learning technique that is used to address classification and regression issues. However, its primary use is in categorization difficulties. k-NN uses a voting process to decide the category of an unseen observation. This indicates that the class with the most number of votes will be assigned to the data point in question.
Phylogenetic analyses of 41 Y-STRs and machine learning-based haplogroup prediction in the Qingdao Han population from Shandong province, Eastern China
Published in Annals of Human Biology, 2023
Guang-Yao Fan, De-Zhi Jiang, Yao-Heng Jiang, Wei Song, Ying-Yun He, Nixon Austin Wuo
The k-nearest neighbour (kNN) is a non-parametric supervised learning method which is helpful for both regression and classification (Altman 1992). Many prior studies mentioned its potential for allocating each haplogroup based on the Y-STR haplotype (Song et al. 2019b; Yin et al. 2022). Its availability had already been validated by a recent study (Fan 2022). In order to further enhance the predictive performance of the kNN model, a substantial training dataset was adopted for analysis using the “knn” package under the statistical environment R (Zhang 2016). The developed kNN predictor includes 23 common Y-STR loci and corresponding Y haplogroups from 3,248 Han males (Lang et al. 2019; Song et al. 2019a; Yin et al. 2020, 2022; Zhang et al. 2020). The algorithms were implemented using the R script available on GitHub (https://github.com/fanyoyo1983/knn-Y-haplogroup.git). Meanwhile, multi-copy loci and copy number variation (CNV) were excluded in the ML. The specificity and sensitivity of the kNN predictor for each predicted haplogroup were measured and performance was also shown in a confusion matrix.
Themis-ml: A Fairness-Aware Machine Learning Interface for End-To-End Discrimination Discovery and Mitigation
Published in Journal of Technology in Human Services, 2018
Zliobaite (2015) also describes consistency and situation test score as individual-level discrimination measures. Consistency measures the difference between the target label of a particular observation and target labels of its neighbors. K-nearest neighbors (knn) measures the pairwise distance between observations X. Then, for each observation xi and each neighbor (xj, yj) ∈ knn(xi), we compute the differences between yi and target labels of neighbor yj. A consistency score of 0 indicates that there is no individual-level discrimination, and a score of 1 indicates that there is maximum discrimination in the dataset.
Predicting changes in substance use following psychedelic experiences: natural language processing of psychedelic session narratives
Published in The American Journal of Drug and Alcohol Abuse, 2021
David J. Cox, Albert Garcia-Romeu, Matthew W. Johnson
All three algorithms led to similar prediction accuracies. Naïve Bayes Bernoulli led to the highest prediction accuracy (68%) compared to the k-nearest neighbors (63%) and the random forest (63%) algorithms using LSA-Alcohol as input. However, prediction accuracies were similar for the k-nearest neighbors, naïve Bayes Bernoulli, and random forest classifiers using LSA-All (50%, 51%, and 48%, respectively) and LSA-Scrubbed (48%, 51%, and 48%, respectively) as input. Each algorithm involves differing degrees of complexity and computational resources. This study indicates the least complex and resource-intensive algorithm (i.e., k-nearest neighbors) led to similar prediction accuracies as the other algorithms. Future research will have to determine the conditions under which more complex and resource-intensive algorithms outperform the k-nearest neighbors algorithm such that the tradeoff is worthwhile.
Related Knowledge Centers
- Anomaly Detection
- NONparametric Statistics
- Time Series
- Normalization
- Feature
- Variable Kernel Density Estimation
- Kernel
- Nearest Neighbor Search
- Local Outlier Factor
- Likelihood-Ratio Test