Training set – Knowledge and References

Explore chapters and articles related to this topic

Medical Diagnosis and Treatment Plans Derived from a Hybrid Expert System

Published in Abraham Kandel, Gideon Langholz, Lotfi A. Zadeh, Hybrid Architectures for Intelligent Systems, 2020

D. L. Hudson, M. E. Cohen, P. W. Banda, M. S. Blois

The crux of the neural network approach is the determination of the weighting factors which connect the nodes. These are determined through the use of a learning algorithm. A neural network learning algorithm developed by the authors was utilized [40]. This learning algorithm is non-statistical in nature. It utilizes generalized vector spaces to generate multidimensional decision surfaces, relying upon a new class of multidimensional orthogonal functions developed by Cohen [41]. A supervised learning approach is utilized, in which data of known classification are used to determine weights. An initial assignment is made, followed by an iterative procedure in which weights are adjusted until each case, in turn, is classified correctly. In this application, data were divided into a training set which was used to determine weights, and a test set which was used to ascertain the accuracy of the model.

Application Development

View Chapter

Purchase Book

Published in Scott E. Umbaugh, Digital Image Processing and Analysis, 2017

Scott E. Umbaugh

The training and test set paradigm is used in statistical analysis to generate unbiased results of a particular algorithm. A training set is used for training or developing the algorithm, and the test set is used for testing the algorithm. For our experiments, we used the leave x out method, with both 1 and 10 for x. In the leave-ten-out method, 10 samples from a data set of n samples are saved for testing and an algorithm is developed based on the remaining (n − 10) samples. The 10 samples that were withheld are then tested. This procedure is repeated for n/10 iterations, with each iteration using (n − 10) samples for developing the algorithm and with the testing performed on the 10 remaining samples, which are not used in the training set. The leave-ten-out method was preferred over the leave-one-out method because it requires fewer computations to develop a classification model, and it was determined that results were similar with both techniques.

Modern machine learning techniques and their applications

View Chapter

Purchase Book

Published in Amir Hussain, Mirjana Ivanovic, Electronics, Communications and Networks IV, 2015

Mirjana Ivanović, Miloš Radovanovic

Overfitting is a related notion to the bias-variance tradeoff within supervised learning, and refers to the notion that a classifier can be trained "too much," in the sense of maximizing its performance on the training set, which may in fact lead to suboptimal performance on a separate test set and real life data. Overfitting may come as a consequence of a small or large number of training instances, noisy data, and/or high dimensionality. Some classifiers are more prone to overfitting than others, and many of them employ complex strategies to avoid it. The philosophical equivalent of the problem lies in the Occam's razor principle, which in ML terms translates to preferring a simple model which reasonably fits the data, to a complex one which does so more accurately.

Block structure optimization in PEMFC flow channels using a data-driven surrogate model based on random forest

View Article

Journal Information

Published in International Journal of Green Energy, 2023

Jiayang Zheng, Yanzhou Qin, Qiaoyu Guo, Zizhe Dong, Changrong Zhu, Yulin Wang

As shown in Figure 2, k-fold cross-validation divides the obtained sample data into k mutually exclusive subsets of similar size, and then uses the concatenated set of k-1 subsets each time as the training set and the remaining one as the test set, so that k combinations can be obtained. The training set is used to train the model, and the test set is used to evaluate how well the model predicts. This process needs to be performed k times in total, and k models can be trained. The final score is obtained by averaging over the evaluation results of k times. In this study, fourfold cross-validation is used, which is implemented by using the GridSearchCV module in the scikit-learn library in the Python environment.

Covid-19 diagnosis by WE-SAJ

View Article

Journal Information

Published in Systems Science & Control Engineering, 2022

Wei Wang, Xin Zhang, Shui-Hua Wang, Yu-Dong Zhang

In machine learning, researchers usually divide the dataset into a training set, which is used for model training, and a test set, which is used to test model performance and thus improve the generalisation of the model. However, machine learning is a data-driven science. The size of the dataset has a significant impact on the model's performance, with larger amounts of data tending to train higher performance models. However, many studies face difficulties with data scarcity, and the division of datasets into training and testing sets severely reduces the amount of data available for training, thus affecting model performance. The core theory of Cross-Validation is to reuse data to increase the amount of data available for training models while ensuring the availability of test sets. The K-fold Cross-Validation (Rajasekaran & Rajwade, 2021) used in this study is a widely used Cross-Validation method. This method divides the dataset into pre-specified K groups, takes one of them without repeating as the test set, uses all the other data as the training set, and calculates the model performance using the training set. The training is repeated K times while ensuring that a different data set is switched as the test set. Finally, get the final performance () from the model performances over the K tests. Figure 4 illustrates a concrete form of K-fold Cross-Validation. To obtain a more reliable and robust result, we used a 10-fold Cross-Validation to divide the dataset.

Steady-state characteristics prediction of marine towing cable with BPNN

View Article

Journal Information

Published in Ships and Offshore Structures, 2022

Li Guo, Yuchao Yuan, Wenyong Tang, Hongxiang Xue

Generalisation is an important indicator for evaluating the neural network ability. Poor generalisation is usually caused by two reasons, namely under-fitting and over-fitting. Under-fitting refers to the poor prediction of the model on the training set. It is mainly due to the disappearance of the gradient, which is usually solved by using a reasonable activation function or adding neural units. Over-fitting means that the model fits the training set very well but presents a poor prediction on the validation set. ‘Early-stopping’ technique (Coulibaly et al. 2000) is adopted on this network to avoid over-fitting. Figure 4 shows the relationship between different data sets. The training set is used to update the weight and bias of the neural network, the validation part evaluates the fit of the network after each update step and determines whether the iteration is terminated, and the test part is for evaluating the generalisation of the final model. Training automatically stops when MSE of the validation samples stops increasing as illustrated in Figure 4. Performance testing