Model selection – Knowledge and References

Explore chapters and articles related to this topic

Groundwater: Modeling

Published in Brian D. Fath, Sven E. Jørgensen, Megan Cole, Managing Water Resources and Hydrological Systems, 2020

The first step in any modeling effort involves constructing a conceptual model, describing it by means of appropriate governing equations, and translating the latter into a computer code. Model selection involves the process of choosing between alternative model forms. Methods for model selection can be classified into three broad categories. The first category is based on a comparative analysis of residuals (differences between measured and computed system responses) using objective as well as subjective criteria. The second category is denoted parameter assessment and involves evaluating whether or not computed parameters can be considered as “reasonable.” The third category relies on theoretical measures of model validity known as “identification criteria.” In practice, all three categories will be needed: residual analysis and parameter assessment suggest ways to modify an existing model and the resulting improvement in model performance is evaluated on the basis of identification criteria. If the modified model is judged an improvement over the previous model, the former is accepted and the latter discarded.

Classical Statistics and Modern Machine Learning

View Chapter

Purchase Book

Published in Mark Chang, Artificial Intelligence for Drug Development, Precision Medicine, and Healthcare, 2020

Mark Chang

Overfitting and Model Selection Overfitting refers to the phenomenon that a statistical model fits current dataset too well by, for example, including too many parameters, so that when the future data deviate from the current dataset due to randomness of the data, the model does not predict the outcome well. To overcome overfitting and improve prediction, only parameters with significant p-values after adjusting multiplicity or multiple testing will be included in the model. The model selection methods, such as the forward, backward, and stepwise methods, are familiar to traditional statisticians. However, such model selection involves multiple tests, and a problem is that when there are many parameters, as in genomics studies, multiplicity adjustment can result in significant loss in power to detect certain parameter effects.

An Optimal Diabetic Features-Based Intelligent System to Predict Diabetic Retinal Disease

View Chapter

Purchase Book

Published in Ayodeji Olalekan Salau, Shruti Jain, Meenakshi Sood, Computational Intelligence and Data Sciences, 2022

M. Shanmuga Eswari, S. Balamurali

Model selection is the process of choosing one final machine learning model for a training dataset from a pool of candidate machine learning models. Model selection is a technique that can be applied to a variety of models such as logistic regression, SVM, Knn, ensemble, and neural network. To this objective, people must compare the relative performance of different models. As a result, the loss function and the metric that represents it become critical in determining the best and least over-fitted models.

A comprehensive comparison and analysis of machine learning algorithms including evaluation optimized for geographic location prediction based on Twitter tweets datasets

View Article

Journal Information

Published in Cogent Engineering, 2023

Hasti Samadi, Mohammed Ahsan Kollathodi

Model parameters are the set of configuration parameters that are internal to the model and that can be learned from the historic training data. The value of those parameters is estimated from the input data. The model parameters would specify how the input data would be transformed into the desired output, while the hyperparameters would define the structural model. In machine learning hyper parameter optimization or tuning is the process of choosing an optimal set of hyperparameters for a learning algorithm which is when the hold-out method is used for model selection. A three-way holdout method for model selection was chosen which involved splitting the data into different sets one for training and other sets for validation and testing. The hold-out method for training the machine learning model is a technique that involves splitting the data into different sets including one set for training, and other sets for validation and testing. It is a known fact that the holdout method can be used for both model evaluation and model selection (Eisenstein et al., 2010; Pavalanathan & Eisenstein, 2015; Pennington et al., 2014).

Battle damage-oriented spare parts forecasting method based on wartime influencing factors analysis and ε-support vector regression

View Article

Journal Information

Published in International Journal of Production Research, 2020

Xiong Li, Xiaodong Zhao, Wei Pu

Cross validation is a method for model selection in terms of the predictive ability of the models. Suppose that n data points are available. A model is to be selected from a class of models. First, hold one data point and use the rest of n-1 data points to fit a model. Then check the predictive ability of the model in terms of the withheld data point. Perform this procedure for all data points. Select the model with the best average predictive ability (Rao and Wu 2005). Cross validation has several advantages. Firstly, training data won’t be affected by random factors and training process can be duplicated. Secondly, every model training process almost involves all samples and they are the closest to mother samples, so results are reliable. However, it also has some disadvantages, such as heavy calculation load and limited applicability to samples with small data size. The advantage of grid search method is that it has high feasibility to find the optimal points due to relative independence of each grid point. Moreover, it has low complexity for the involvement of few parameters. Thus, the combination of cross validation and grid search method is very useful approach for battle damage-oriented spare parts forecasting.

Identification of simplified energy performance models of variable-speed air conditioners using likelihood ratio test method

View Article

Journal Information

Published in Science and Technology for the Built Environment, 2020

Maomao Hu, Fu Xiao, Howard Cheung

Model selection is the process of selecting the most suitable model from a set of candidate models using statistical inference techniques, given a series of observations. The likelihood ratio test (LRT) is one of the most popular statistical model selection methods (Casella and Berger 2002). The main idea of the LRT is to compare the goodness of fit of two statistical models, i.e., a null model (representing the null hypothesis Ho) and an alternative model (representing an alternative hypothesis H1), by using the ratio λ of the likelihoods of the two models (Pawitan 2001). The implementation of the LRT is based on the likelihood function and maximum likelihood estimation (MLE), which are also prerequisites for the chi-squared test, Bayesian methods, and some model selection criteria like the Akaike information criterion (Myung 2003). Bacher and Madsen (2011) used the LRT to identify the most suitable model to characterize the thermal dynamic of the building. A forward selection procedure was developed to select the suitable model among a series of models of increasing complexities. Before the application to AC model selection, the fundamentals behind the MLE and LRT are first briefly introduced in this section.