Bootstrap aggregating – Knowledge and References

Explore chapters and articles related to this topic

Learning Techniques

Published in Peter Wlodarczak, Machine Learning and its Applications, 2019

As mentioned before, different learners often behave very differently and one key differentiator is how much bias and variance they produce. That is why usually several learners are trained and evaluated to find the best suitable algorithm for the problem. Another approach is using ensembling. Bagging is a simple yet powerful ensemble method. It is a general procedure that can also be used to reduce variance. Bagging, short for bootstrap aggregating, is a resampling technique where different training sets are created using random selection with replacement (meaning we can select an instance multiple times). Each sample is used to train a model. The predicted values are then combined through averaging and majority-voting. Although bagging was first developed for decision trees, the idea can be applied to any algorithm that produces predictions with sufficient variation in the predicted values [1]. To get good bagged ensemblers, the individual models are overfit, meaning the bias is low. Bagging is reducing variance by averaging the predictions. Bagging can be considered a variance reduction technique. Another feature of bagging is that the resulting model uses the entire feature space for node splits. The biggest weakness of bagging is that it might produce correlated trees, thereby increasing bias and reducing variance. Random forests, that were discussed in Chapter 4, are a simple yet effective way to avoid inter-tree correlations.

Basic Approaches of Artificial Intelligence and Machine Learning in Thermal Image Processing

View Chapter

Purchase Book

Published in U. Snekhalatha, K. Palani Thanaraj, Kurt Ammer, Artificial Intelligence-Based Infrared Thermal Image Processing and Its Applications, 2023

U. Snekhalatha, K. Palani Thanaraj, Kurt Ammer

Bagging comes under the Meta algorithm (an algorithm constructing the new algorithm often by combining or weighing the outputs from the set of other algorithms) which is used for statistical classification and regression. It improves the stability and accuracy of the learning. The smaller value of variance avoids overfitting. It is a model averaging technique which can apply base classifiers on random subsets of the real data and the collection of individual predictions to form a final prediction. The bagging resamples the real training dataset with the replacement of some other data that may be present multiple times while others are left unreplaced. The concepts of bagging are quite easy if we fix several independent models and their total predictions to make a model with a lower variance. It is not possible to fix the fully loaded independent models as it would require a large amount of data. With the approximate properties of bootstrap samples, it is possible to fix models that are nearly independent. Bootstrap sampling is a type of resampling that involves repeatedly drawing sample data from the original data source with replacements. Bagging, also called bootstrap aggregation, is designed to increase the stability and accuracy of machine learning algorithms used in classification and regression. It is specifically used to reduce the variance within the noisy data set. The new bootstrap sample will function as another independent dataset from the true distribution by making the multiple bootstrap samples. Now, we can fix a weak learner from the sample by calculating the mean from the output. The mean of the weak learner’s outputs never changes the output answer, but it can minimize the variance.

Evaluation of factors affecting long-term creep of concrete using machine learning regression models

View Chapter

Purchase Book

Published in Joan-Ramon Casas, Dan M. Frangopol, Jose Turmo, Bridge Safety, Maintenance, Management, Life-Cycle, Resilience and Sustainability, 2022

H. Daou, W. Raphael, F. Geara

Bagging, also called bootstrap aggregating, is one of the first ensemble algorithms machine learning practitioners learn and is designed to improve the accuracy and stability of classification and regression algorithms. Bagging prediction models fit multiple versions of a prediction model and then ensemble them into an aggregate prediction (Breiman 1996). Bagging works especially well for unstable, high variance base learners algorithms whose predicted output undergoes a major change in response to small changes in the training data (Dietterich 2000a).

On the use of text augmentation for stance and fake news detection

View Article

Journal Information

Published in Journal of Information and Telecommunication, 2023

Ilhem Salah, Khaled Jouini, Ouajdi Korbaa

Bagging and stacking are among the main classes of parallel ensemble techniques. Bagging (i.e. Bootstrap aggregating) involves training multiple instances of the same classification algorithm, then combining the predictions of the obtained models through hard or soft voting. To promote diversity, base learners are trained on different subsets of the original training set. Each subset is typically obtained by drawing random samples with replacement from the original training set (i.e. bootstrap samples). Stacking ( stacked generalization) involves training a learning algorithm (i.e. meta-classifier) to combine the predictions of several heterogeneous learning algorithms, trained on the same training data. The most common approach to train the meta-model is via k-fold cross-validation. With the k-fold cross-validation, the whole training dataset is randomly split (without replacement) into independent equal-sized k-folds. k−1 folds are then used to train each of the base models and the kth fold (holdout fold) is used to collect the predictions of base models on unseen data. The predictions made by base models on the holdout fold, along with the expected class labels, provide the input and the output pairs used to train the meta-model. This procedure is repeated k times. Each time a different fold acts as the holdout fold while the remaining folds are combined and used for training the base models.

Predicting sewer structural condition using hybrid machine learning algorithms

View Article

Journal Information

Published in Urban Water Journal, 2023

L. V. Nguyen, S. Razak

Bagging (Bootstrap Aggregating) was proposed by Breiman (1996) to raise the stability of models significantly classification problems by improving accuracy and reducing variance. There are three main steps implemented in this model: Creating multiple datasets: new sewer pipe points are created by randomly selecting samples with replacement (e.g. the individual sewer data points can be chosen more than one time) from the original training dataset.Building multiple J48DT classifiers: the J48DT algorithm is used to independently train using random subsets from the previous step. Each J48DT will predict sewer condition status from the subset.Combining classifier: the sewer condition status predictions of all the individual J48DT classifiers are combined to give a better classifier, usually with less variance compared to before. Finally, the final sewer condition status is defined using a plurality vote of those predictions from the J48DT models.The concept of the bagging ensemble method is shown in Figure 6.

A policy knowledge- and reasoning-based method for data-analytic city policymaking

View Article

Journal Information

Published in Building Research & Information, 2021

Sun-Young Ihm, Hye-Jin Lee, Eun-Ji Lee, Young-Ho Park

The decision tree technique is an algorithm used for classification and prediction through schematization of the rules of data into a tree structure, and for this experiment, a random forest-based decision tree is proposed to derive the important variables. For the decision tree, the top-down method is used primarily to select variable values that divide a given set of data into the most appropriate criteria for each step of the operation (Rokach & Maimon, 2005). The process used in forecasting is clear and has the advantage of featuring the ability to handle both numerical and categorical data concurrently and is used in various areas related to forecasting (Bala & Kaur, 2017; Yahaya et al., 2019). However, this method has disadvantages – finding the optimal decision tree is extremely difficult, and overfitting can occur easily, which affects the general performance of the model on new datasets. In the random forest method, data are bootstrapped to create a forest; this is referred to as bagging. Bagging is an abbreviation for ‘bootstrap aggregating’ and is a method that employs bootstrapping to gradually combine the initial classifiers trained for different training data (Breiman, 1996). This learning method does not use all data but instead places sample results on each tree as input values. By doing so, each tree is created with different data, thereby introducing randomness into the procedure. To improve the shortcomings of the decision tree mentioned earlier, more descriptive variables were selected by combining random forests in which each tree is built with different data, resulting in randomness.