Resampling – Knowledge and References

Explore chapters and articles related to this topic

Structural Equation Modeling

Published in Jhareswar Maiti, Multivariate Statistical Modeling in Engineering and Management, 2023

Bootstrapping is a resampling technique where the original sample (of size N) is considered as the pseudo-population and then randomly K samples (K is very large), each of size N are collected from the pseudo-population. The hypothesized SEM model is estimated using certain technique (i.e., ML, ULS, or GLS) for each of the K samples. So, for each of the free parameters of the model, we have K values of estimate. If γ is the parameter to be estimated, then we have K number of γ values. The final γ value, as estimate of γ, is the mean of K-values.γ=∑k=1KγkK

Ensemble Methods The Wisdom of the Crowd

View Chapter

Purchase Book

Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022

Chong Ho Alex Yu

There are four major types of resampling: randomization exact test (also known as permutation test), Jackknife, cross-validation, and bootstrap (Yu 2003; 2007). In the past researchers needed to purchase specialized statistical software packages (e.g., StatXact) for running exact tests, but today the randomization exact test is a standard feature in most statistical software packages. When CM-square analysis is invalid due to insufficient expected cell counts, the exact test becomes an alternative. In Jackknife, the same test is repeated by leaving one subject out each time. Thus, this technique is also called “leave one out.” Cross-validation is resampling without replacement. To be specific, after the observations are randomly assigned into training, validation, and testing groups, no observation can be put back into the sampling pool and re-assigned into another group. In contrast, bootstrapping is resampling with replacement, meaning that after the observations are randomly chosen, they are put back to the pool and can be selected again in the next run.

Some Practical Notes

View Chapter

Purchase Book

Published in Seyedeh Leili Mirtaheri, Reza Shahbazian, Machine Learning Theory to Applications, 2022

Seyedeh Leili Mirtaheri, Reza Shahbazian

Resampling methods are essential techniques in the modern machine learning area. In these techniques, the training dataset is split into parts and a model of interest will fit on each part in order to obtain additional information. For example, in order to estimate the variability of a linear regression, the training dataset is split into subsets, to fit a linear regression to each new group, and then examine the extent to which the resulting different fits differ. Such an approach may allow us to obtain information that would not be available from fitting the model only once using the original training sample. Resampling approaches can be computationally expensive, because they involve fitting the same algorithm multiple times by using different subsets of the training data. However, due to recent advances in computing power, the computational requirements of resampling methods generally are not prohibitive. One of the most commonly used resampling techniques is cross validation which we discuss in this chapter.

Assessing the operational efficiency of wastewater services whilst accounting for data uncertainty and service quality: a semi-parametric approach

View Article

Journal Information

Published in Water International, 2020

Jayanath Ananda

As mentioned in the introduction, the deterministic feature of DEA has been dealt with in this article. The classical DEA model developed by Charnes et al. (1978) is a deterministic approach, and it assumes that data inaccuracies, extreme outliers and data measurement uncertainties would not influence the efficiency scores, which is a major limitation. To overcome this limitation, this article applies the state-of-the-art bootstrapping method developed by Simar and Wilson (2007). Bootstrapping is a data simulation or resampling method first introduced by Efron (1979), which mimics the data generating process of the sampling distribution. In simple terms, the method generates ‘new data’ repeatedly from the original sample distribution, and this new or pseudo data can be used to re-estimate the DEA models specified in Equations (1) and (2). By repeatedly estimating (B bootstrap replicates) the DEA efficiency scores with new data points generated, it is possible to produce an empirical distribution of bootstrap values that yields a Monte Carlo approximation of the sampling distribution (Ananda, 2013). The technical details of the double-bootstrap procedure used in the analysis can be found in Simar and Wilson (2007).

Introducing an evolutionary-decomposition model for prediction of municipal solid waste flow: application of intrinsic time-scale decomposition algorithm

View Article

Journal Information

Published in Engineering Applications of Computational Fluid Mechanics, 2021

Linyuan Fan, Maryam Abbasi, Kazhal Salehi, Shahab S. Band, Kwok-Wing Chau, Amir Mosavi

Resampling process is a pre-processing step that changes the original data distribution in order to meet some user-prescribed criteria. That is the resampling method does not consider the generic distribution tables such as normal distribution tables to compare probability values. In this technique, random replacement of original data series according to which number of sample cases are similar to the original data series, are selected. There are some categories of resampling approaches such as cross-validation, Jackknife resampling, random subsampling, and nonparametric bootstrapping (Fox, 2002).

Reweighting anthropometric data using a nearest neighbour approach

View Article

Journal Information

Published in Ergonomics, 2018

Kannan Anil Kumar, Matthew B. Parkinson

The estimation of anthropometry for a specific population may also rely on statistical resampling and/or weighting. This strategy modifies the available data to better represent the target user population. Resampling alters the number of datapoints in the data by removing or multiplying certain data to match the required overall characteristics of the target population. For example, the ANSUR data were downsampled from nearly 9000 men and women to create a subset of 1774 men and 2208 women that matched the age, race, and ethnicity of the US Army (Gordon et al. 1989).