Results and Discussion
Arwa Ahmed Gasm Elseid, Alnazier Osman Mohammed Hamza in Computer-Aided Glaucoma Diagnosis System, 2020
ROC graphs are constructed by plotting the true positive rate (TPR) against the false positive rate. Figure 5.29 identifies a number of regions of interest in an ROC graph. The diagonal line from the bottom-left corner to the top-right corner shows the classifier’s performance. In the extreme case, denoted by the point in the bottom-left corner, a conservative classification model will classify all instances as negative and it will not commit any false positives, but it will also not obtain any true positives. The region of classifiers’ performance appears at the top of the graph. These classifiers have a good true positive rate, but they also have a number of false positive errors. When the classifier is at the top-right corner it means that it classifies every instance as positive. In this situation, the classifier will not miss any true positives, but it will also miss a very large number of false positives. If the classifiers fall to the right of the random performance line, this mean it has a performance worse than random performance, due to it producing more false positive than true positive responses. However, because ROC graphs are symmetrical along the random performance line, the point in the top-left corner denotes perfect classification: 100% true positive rate and 0% false positive rate.
Radiobiological Evaluation and Optimisation of Treatment Plans
W. P. M. Mayles, A. E. Nahum, J.-C. Rosenwald in Handbook of Radiotherapy Physics, 2021
To create the ROC curve, the true positive ratio (TPR) and the false positive ratio (FPR) are calculated for a series of NTCP cut-offs. The curve is the relationship between FPR and TPR as the cut-off is decreased in small steps from NTCP = 100% to 0%; see Figure 44.25. The better the responders and non-responders are separated by the model, the closer the curve will be to the top left corner. By contrast, if the ROC curve follows the diagonal, the model is no better than a random labelling of positives and negatives. The area under the curve (AUC) is an intuitive measure of model performance, since it is equal to the probability that a randomly chosen responder has a higher NTCP than a randomly chosen non-responder. As a general rule, a value greater than 0.9 is excellent, between 0.7 and 0.9 moderate, and below 0.7 poor. The AUC is a rank-order statistic and when calculated with the trapezoidal rule it is identical to the two-sample Mann-Whitney U-statistic (DeLong et al. 1988). DeLong et al. (1988) give a way of calculating confidence intervals for AUC estimates and a test for significance in the difference between two models.
The Promise of Artificial Intelligence and Machine Learning
Paul Cerrato, John Halamka in Reinventing Clinical Decision Support, 2020
Dermatology has also seen major advances in AI-driven image analysis. Andre Esteva, with the Department of Electrical Engineering, Stanford University, along with colleagues in the Stanford University Department of Dermatology and others, published a landmark study in Nature in 2017 demonstrating that a neural network–generated algorithm was as effective in diagnosing skin cancer as human dermatologists.14 To reach that conclusion, they trained the neural network on a data set containing over 129,000 clinical images and compared the resulting algorithms to the diagnostic performance of 21 board-certified dermatologists, evaluating the ability to differentiate keratinocyte carcinoma from benign seborrheic keratosis and malignant melanoma from benign nevi. The data set was derived from open-access dermatology repositories, the International Skin Imaging Collaboration (ISIC) Dermoscopic Archive, the Edinburgh Dermofit Library, and Stanford Hospital. The researchers used an under the receiver operating characteristic (ROC) curve to make their comparison. A perfect ROC or AUC score of 1 indicates 100% accurate performance. For carcinoma and melanoma, the algorithm generated AUCs of 0.96 and 0.91–0.94, respectively, which were superior to the performance of 21–24 dermatologists.
Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis
Published in Computer Assisted Surgery, 2019
Overall accuracy becomes meaningless when the learning cancer is the minority examples [24]. As pointed out by Raeder [25], other performance criteria must be considered since evaluation metrics has an import role for classifier selection. Therefore, other performance criteria such as G-mean, area under curve (AUC) and MCC which are the most common criteria in class imbalance learning are used to validate the performance. Large values of these criteria represent good classification performance. AUC measures the area under the receiver operating characteristic (ROC) curve. G-mean is the harmonic average between sensitivity and specificity. The quality of unbalance binary classification can be obtained by MCC. The G-mean and MCC are calculated based on the confusion matrix as presented in Table 2 as follows:
A simple ABCD score to stratify patients with respect to the probability of survival following in-hospital cardiopulmonary resuscitation
Published in Journal of Community Hospital Internal Medicine Perspectives, 2021
William R. Swindell, Christopher G. Gibson
We formulated a simple ABCD index score to predict CPR survival, with values ranging from −2 (higher predicted survival) to 4 (lower predicted survival) (Figure 4(a)). The four variables incorporated into the score were each significantly associated with survival to discharge when combined as covariates within the same general linear model (Table 1). A majority of patients (57%) had an ABCD score of 1 or greater (Figure 4(b)), and those with a score of −2 had a survival rate 3.8 times higher than those with a score of 4 (46.0% versus 12.1%) (Figure 4(c)). An ABCD score ≤2 was 96.8% sensitive for predicting survival to discharge (Figure 4(e)). Alternatively, an ABCD score ≤ −1 was 82.0% specific for predicting survival (Figure 4(f)). Accuracy and positive and negative predictive values likewise varied based upon the ABCD score threshold chosen (Figure 4(d,g,h)). The receiver operating characteristic (ROC) curve AUC statistic was modest (0.581) but significantly greater than the null expectation of 0.50 (95% CI: 0.577, 0.585; Figure 4(i)).
Automated prediction of COVID-19 mortality outcome using clinical and laboratory data based on hierarchical feature selection and random forest classifier
Published in Computer Methods in Biomechanics and Biomedical Engineering, 2023
Nasrin Amini, Mahdi Mahdavi, Hadi Choubdar, Atefeh Abedini, Ahmad Shalbaf, Reza Lashgari
One of the methods to evaluate the performance of binary classification algorithms is receiver operating characteristic (ROC) curve (Klawonn et al. 2011). In the ROC diagram, both sensitivity or true positive rate (TPR) and recall or false positive rate (FPR) as indicators for the performance of binary classification algorithms based on logistic regression (LR) are combined and displayed as a curve. The area under the ROC curve (AUC) is also used for the evaluation of the performance of binary classification algorithms based on given input features. AUC, as a very useful and easy-to-use framework, tells how much the model is capable of distinguishing between two classes and seeing the importance of given input features (Mamitsuka 2006). The numerical value of the AUC varies from zero to one, with numbers closer to one meaning the test method has good detection or accuracy. Finally, the Wilcoxon rank-sum test was used to evaluate the significance of the extracted features. The Wilcoxon rank-sum test is a non-parametric test for two groups whose samples are independent of each other (Fay and Proschan 2010). The probability value (p-value) of this test indicates the probability of error in accepting the validity of the observed results. Utilizing this non-parametric analysis is a common method for selecting predictive features for classification algorithms. In this study, to evaluate the performance of binary classification algorithms and select the best features for them, ROC, AUC, and p-value criteria were used.
Related Knowledge Centers
- Biometrics
- Data Mining
- False Positive Rate
- Power of A Test
- Sensitivity & Specificity
- Type I & Type II Errors
- Sensitivity & Specificity
- Power of A Test
- Type I & Type II Errors
- Decision-Making
- Psychology
- Radiology
- Medical Diagnosis