Overfitting – Knowledge and References

Explore chapters and articles related to this topic

A Comprehensive Study on MLP and CNN, and the Implementation of Multi-Class Image Classification using Deep CNN

Published in K. Gayathri Devi, Kishore Balasubramanian, Le Anh Ngoc, Machine Learning and Deep Learning Techniques for Medical Science, 2022

S.P. Balamurugan

When all features are connected to the fully connected layer, the training dataset is prone to overfitting. Overfitting occurs when a model performs well on training data but not on new data, resulting in a negative influence on the model's performance.

Case Studies/Success Stories on Machine Learning and Data Mining for Cancer Prediction

View Chapter

Purchase Book

Published in Meenu Gupta, Rachna Jain, Arun Solanki, Fadi Al-Turjman, Cancer Prediction for Industrial IoT 4.0: A Machine Learning Perspective, 2021

Chander Prabha, Geetika Sharma

Last but not the least, a major challenge with data mining and ML is overfitting and underfitting. If we select fewer features while training our model, then the problem of underfitting will arise, whereas if we select a large number of features in the training phase, then sometimes a problem of overfitting will arise. That will sometimes create a big problem in achieving higher accuracy.

I am going to make a prediction model. What do I need to know?

View Chapter

Purchase Book

Published in Thomas A. Gerds, Michael W. Kattan, Medical Risk Prediction, 2021

Thomas A. Gerds, Michael W. Kattan

Overfitting is a serious threat to each prediction model. Overfitting means that the model is better at predicting the individuals in the dataset that were used for making the model than it is at predicting the individuals in the background population or future patients. A more complex model will often outperform a less complex model in the training data. One drawback to the complex model is that it is often unclear if the more complex model predicts yet unseen patients more accurately. And since more complex models often issue more extreme predictions that could motivate subjects to make a decision, it is important to not only compare the average performance but also look into outliers in the personalized predictions when comparing a complex model to a simple model.

A guide to optometrists for appraising and using artificial intelligence in clinical practice

View Article

Journal Information

Published in Clinical and Experimental Optometry, 2023

Timothy I Murphy, James A Armitage, Peter van Wijngaarden, Larry A Abel, Amanda G Douglass

Training AI in this way carries a risk of overfitting, where performance is good for the sample data set but is poor on novel data. Model overfitting is a common cause of poor real-world performance of some AI systems. To overcome this, the data may be split into three subsets: training, test, and validation sets.35,36 Training data are used to train the system with connection weight modification occurring as described above. After each epoch, performance is assessed using the test dataset, which has not been assessed by the AI and is not used to modify weights. If the performance is significantly better with training data compared to test data, this can indicate overfitting. Once the model has converged, a final assessment against the validation set is performed to estimate performance with data that the model has not been exposed to previously.

Machine Learning Approaches-Driven for Mortality Prediction for Patients Undergoing Craniotomy in ICU

View Article

Journal Information

Published in Brain Injury, 2021

Ronguo Yu, Shaobo Wang, Jingqing Xu, Qiqi Wang, Xinjun He, Jun Li, Xiuling Shang, Han Chen, Youjun Liu

In this study, we used Scikitlearn (15) package in Python to fit the ML models including the LR algorithm, the traditional linear method which was widely used in medical PM (16–18); the RF algorithm, that generates multiple decision trees and has better interpretability and can establishe the correlation between features; the SVM algorithm, a dichotomous supervised algorithm which can be used in high-dimensional feature space; ANN algorithm, which has been successfully applied in clinical outcome prediction of trauma mortality (19); the XGBoost algorithm, an end-to-end tree boosting system, which was used widely by data scientists to achieve state-of-the-art results on many ML challenges in recent years as for its advantages on over fitting and missing value processing (20).

Molecular tissue profiling by MALDI imaging: recent progress and applications in cancer research

View Article

Journal Information

Published in Critical Reviews in Clinical Laboratory Sciences, 2021

Pey Yee Lee, Yeelon Yeoh, Nursyazwani Omar, Yuh-Fen Pung, Lay Cheng Lim, Teck Yew Low

Similar to other large-scale omics discovery analysis, the identification of molecular signatures as diagnostic and/or prognostic biomarkers using MALDI imaging is prone to the risks of overfitting and underfitting [126]. Overfitting refers to a model prediction that fits and performs well on training data but fails to predict with new test data. Underfitting refers to a model that cannot detect the relationship between features and does not predict with any dataset. Predictive models using MALDI imaging data are often built from data-driven selection methods that prioritize molecular features based on their discriminative power [127]. However, due to noise in the data or unknown data interactions, these methods may result in false-positive markers or may miss important features. Several solutions, such as using multi-omics data integration to account for the complexity in the biological system [128], combining biological knowledge with appropriate statistical analysis [129], and performing knowledge-based functional annotations [130], have been proposed to combat these issues.