Cross validation – Knowledge and References

Explore chapters and articles related to this topic

Bayesian Applications

Published in Song S. Qian, Mark R. DuFour, Ibrahim Alameddine, Bayesian Applications in Environmental and Ecological Studies with R and Stan, 2023

Song S. Qian, Mark R. DuFour, Ibrahim Alameddine

The validation of BNs remains to date a major challenge and shortcoming of most environmental BN models, largely due to data scarcity and the dependence of the model (or parts of it) on expert judgment. Validation is thus often restricted to rigorous peer review and model reconciliation through a qualitative assessment of model resilience against changes of inputs [Moe et al., 2016]. Other studies have attempted to validate models through assessing their hindcast skills [Marcot, 2012]. Even when data are available, the ability to conduct cross-validation is often limited given the increase in model size and the associated data needed to fully parameterize the CPTs. As such and while several validation approaches have been proposed (e.g., error rates and confusion tables, area under (ROC) curve, k-fold cross validation, spherical payoff, Schwartz's Bayesian Information Criterion, Cohen's Cappa) and some have even been successfully implemented in several of the major BN commercial software [Marcot, 2012], model validation remains infrequent [Aguilera et al., 2011; Barton et al., 2008; Moe et al., 2016].

Wound Tissue Classification with Convolutional Neural Networks

View Chapter

Purchase Book

Published in Kayvan Najarian, Delaram Kahrobaei, Enrique Domínguez, Reza Soroushmehr, Artificial Intelligence in Healthcare and Medicine, 2022

Rafael M. Luque-Baena, Francisco Ortega-Zamorano, Guillermo López-García, Francisco J. Veredas

The proposed methodology has been illustrated in Figure 8.1 and has been presented in the previous section. To avoid classification bias, a stratified K-fold cross-validation process has been carried out with the number of folds K = 5, taking the 20% of the dataset apart for model testing. Furthermore, this process has been repeated 5 times to increase the robustness of the results. For each fold, there are 67 training images, 23 validation images (in total 80% of the dataset), and 23 test images (20% of the dataset). After the resampling to get the partition of the images in the dataset corresponding to each fold, the process described in section X.3.1 is applied, generating on average 6,652 ROIs for training, 2,362 for validation, and 2,305 for testing.

Choosing among Competing Specifications

View Chapter

Purchase Book

Published in Douglas D. Gunzler, Adam T. Perzynski, Adam C. Carle, Structural Equation Modeling for Health and Medicine, 2021

Douglas D. Gunzler, Adam T. Perzynski, Adam C. Carle

The Browne-Cudeck criterion (BCC) [27] is a single sample cross-validation index which can be derived for each of the competing models under study. The BCC will give an indication of which models are stable under cross-validation. Cross-validation is a resampling procedure used to evaluate model performance. In cross-validation, data is split into training and test subsets. The average of model evaluation scores across the subsets is then used to summarize model performance. Cross-validation methods do not depend on parametric assumptions. There is a slight bias in cross-validation methods, in that the training sample is smaller than the actual data set. However, the effect of this bias will typically be conservative in that the estimated fit will be slightly biased in the direction suggesting a poorer fit [27]. See Hastie, Tibshirini and Friedman [28] for an overview of cross-validation techniques.

In silico QSAR modeling to predict the safe use of antibiotics during pregnancy

View Article

Journal Information

Published in Drug and Chemical Toxicology, 2023

Feyza Kelleci Çelik, Gül Karaduman

The validation of the QSAR models was made as internal and external validation. Ten-cross validation was applied for all the machine learning models for the internal validation. k-fold cross-validation is a technique that is used to ensure the machine learning models perform the unseen data well. k is a chosen number that split the data sample into k groups with the same number of samples. Once one testing cycle was completed new training and test sets were created to perform the same procedure for the refreshed sets. The process was completed after the cross-validation process is repeated k times. In total, k − 1 of k samples were used for training data, while 1 was kept as a test set to validate the model. The success of the model is the sum of the performances of all classification functions is averaged by dividing by k. In our study, k was set as 10.

Prediction of emergence from prolonged disorders of consciousness from measures within the UK rehabilitation outcomes collaborative database: a multicentre analysis using machine learning

View Article

Journal Information

Published in Disability and Rehabilitation, 2023

Richard J. Siegert, Ajit Narayanan, Lynne Turner-Stokes

Models can be “data-fitting” or “cross-validated.” A data fit model uses all the data for model construction, with no samples withheld for testing. A cross-validated model uses only a subset of data for model construction and then tests the robustness of the model against the remaining withheld data. Common methods of cross-validation include leave-one-out (LOO) and x-fold. LOO constructs a model on all the samples except one and then tests the model on that withheld sample, repeated for every sample. This leads to the construction of as many models as there are samples, and the accuracy of prediction of LOO cross-validation (true positives plus true negatives over all samples) is reported at the end of all model evaluations. In x-fold cross-validation, the data is split into x equal-sized sets (typically, 10). A model is constructed on all but one set and then tested against the withheld set. This is repeated for every set with the overall average accuracy reported at the end of all evaluations.

An elastic-net penalized expectile regression with applications

View Article

Journal Information

Published in Journal of Applied Statistics, 2021

Q.F. Xu, X.H. Ding, C.X. Jiang, K.M. Yu, L. Shi

For λ, in the ER-EN model, in order to ensure global convergence and actually implement the algorithm, selecting a good tuning parameter λ is essential in high-dimensional cases. Cross validation is a generally applicable way to predict the performance of a model on a validation set using computation in place of mathematical analysis. It combines averages of prediction error to derive a more accurate estimate of model prediction performance, see [27,28] for more details. In this paper, we adopt the widely used 10-fold cross validation to conduct model selection. Moreover, we use 1SE (Standard Error) criteria to determine the optimal λ. The 1SE criteria is a commonly used one in cross-validation and gives the most regularized model such that error is within one standard error of the minimum, see [69].