Bootstrap – Knowledge and References

Explore chapters and articles related to this topic

Introduction

Published in Randall L. Eubank, Ana Kupresanin, Statistical Computing in C++ and R, 2011

5.34. The bootstrap is a resampling method that can be used to approximate the distribution of a statistic. Let X1,…,Xn be a random sample from some unknown distribution and let TX1,…,Xn be a corresponding statistic. Sampling from X1,…,Xn is then carried out with replacement to obtain B bootstrap samples X1b*,…,Xnb*,b=1,…,B. These

Bootstrap Methods and Their Deployment in SAS and R

View Chapter

Purchase Book

Published in Tanya Kolosova, Samuel Berestizhevsky, Supervised Machine Learning, 2020

Tanya Kolosova, Samuel Berestizhevsky

The bootstrap is a resampling procedure used to estimate statistics on a population by sampling the original dataset. This method substitutes complicated and often inaccurate approximations of biases, variances, and other measures of uncertainty by computer simulations. In simple situations, the uncertainty of an estimate may be gauged by analytical calculations based on an assumed probability model for the available data. But in more complicated cases, this approach can be tedious and difficult, and its results are potentially misleading if inappropriate assumptions or simplifications have been made.

*

View Chapter

Purchase Book

Published in Bruce Ratner, Statistical and Machine-Learning Data Mining, 2017

Bruce Ratner

The bootstrap is a flexible technique for assessing the accuracy‡ of any statistic. For well-known statistics, such as the mean, the standard deviation, regression coefficients, and R-squared, the bootstrap provides an alternative to traditional parametric methods. For statistics with unknown properties, such as the median and Cum Lift, traditional parametric methods do not exist. Thus, the bootstrap provides a viable alternative to the inappropriate use of traditional methods, which yield questionable results.

Evaluating changes in flood regime in Canadian watersheds using peaks over threshold approach

View Article

Journal Information

Published in ISH Journal of Hydraulic Engineering, 2022

Kampanad Bhaktikul, Mohammed Sharif

Once the threshold was finalized for each station, four different measures of flood behaviour were extracted from the POT data. These measures include (i) duration of POT events, (ii) flow volumes in POT events, (iii) sum of durations above POT in each year, and (iv) sum of volumes in POT events in each year. An analysis of trends in these four variables was carried out using Mann–Kendall nonparametric test (Mann 1945; Kendall 1975). The significance of trends obtained by Mann–Kendall test has been evaluated using the bootstrap resampling approach. The bootstrap resampling involves computing the statistic of interest using the samples of same size drawn a large number of times from the original sample. A total of 1000 bootstrap replicates of the original time series of each variable at each station were generated using R package. A block size of 5 was used for resampling. The trend analysis and the bootstrap resampling have been carried out for each of the seven periods of records considered in this analysis.

How SMEs benefit from environmental sustainability strategies and practices

View Article

Journal Information

Published in Supply Chain Forum: An International Journal, 2022

Faiza Khoja, Jeffery Adams, Ralph Kauffman, Mikayel Yegiyan

The pathway model used bootstrapped multiple linear regression, using a Bias Corrected Accelerated confidence interval (CI = .95) and with 2,000 bootstrapped samples. This method allows for testing path analysis with small sample size, making it a more robust model. The bootstrap method is a statistical technique for estimating quantities about a population by averaging estimates from small data samples. Importantly, samples are constructed by drawing observations from a large data sample one at a time and returning them to the data sample after they have been chosen (Shrout and Bolger 2002). The bootstrapped coefficients all showed a confidence interval that did not intersect with 0, and in cases where the significance value differed between the normal regression model and the bootstrapped coefficient, the bootstrapped value was used. The model uses the beta coefficients where applicable and shows significance at the .05 level (*) and the .01 level (**). Error values are shown using 2√(1-R2). Industry, Enterprise Age, and Size (employee count) were controlled in the analysis for all regressions.

Classification of the mechanism of toxicity as applied to human cell line ECV304

View Article

Journal Information

Published in Computer Methods in Biomechanics and Biomedical Engineering, 2021

Yasser Abd Djawad, Janice Kiely, Richard Luxton

Three ensemble techniques were applied for analysing the data; bootstrap aggregating (bagging), boosting and stacking. Bagging consists of two processes namely bootstrap and aggregation. Bootstrap is a statistical technique for retrieving data by sampling using a random process with replacement to generate multiple sets of training data. While aggregation is a process of collecting all the results with similar classifier output for each bootstrap samples for final combined prediction decision as shown in Figure 5(a). One of the methods using bagging algorithms is random forest (RF). Boosting is a sequential tree process using information from the previous classifier output. Prediction at each stage is based on the output of the previous classifier results. This process learns from previous predictions to improvise the final decision as shown in Figure 5(b). In this study the gradient boosting machine (gbm) method was applied. Stacking is an ensemble learning technique that combines several different classifier results (multi classifier) using the same initial data set. Each classifier produces an output that will be used as data for the meta classifier for the final combined decision, as shown in Figure 5(c). In this study, Logistic regression classifier (LR), linear discriminant analysis, k-nearest neighbours, support vector machine, decision tree and naive bayes were used as classifiers and the RF was used as the meta classifier.