Undersampling – Knowledge and References

Explore chapters and articles related to this topic

Euler/Poisson-Type Summation Formulas and Shannon-Type Sampling

Published in Willi Freeden, M. Zuhair Nashed, Lattice Point Identities and Shannon-Type Sampling, 2019

oversamplingundersamplingaliasing Actually, the identity (4.87) forms a quantification of the earlier mentioned Nyquist-context, i.e., they explicitly include all manifestations of over- and undersampling involving the intervals Bρ1,Bτ1 for the lattice Λ = σℤ, σ > 0: Oversampling means that the signal can be sampled, where “superfluous information” can be removed (see, e.g., M.Z. Nashed, Q. Sun [2010]) or used in an appropriate way to accelerate the convergence of the series (see, e.g., R.J. Marks II [1991]). Otherwise, we speak of undersampling. In the case of undersampling, we are confronted with the well-known phenomenon of aliasing, that the signal reconstructed from the samples is different from the original continuous signal.

Analog-to-Digital Converters

View Chapter

Purchase Book

Published in John G. Webster, Halit Eren, Measurement, Instrumentation, and Sensors Handbook, 2017

E.B. Loewenstein

Of course, sampling necessarily throws away some information, so the art of sampling is in choosing the right sample rate so that enough of the input signal is preserved. The major pitfall of undersampling (sampling too slowly) is aliasing, which happens whenever the input signal has energy at frequencies greater than one-half the sample rate. In Figure 88.1a, a signal (the fast sine wave) is sampled at a rate Fs, shown by the hash marks at the bottom of the graph. The sine wave has a frequency of 0.8Fs, which is higher than one-half the sample rates (0.5Fs). Notice that sampling the lighter sine wave of 0.2Fs produces the same set of samples. The resulting sampled data are ambiguous in that we cannot tell from the data what the frequency of the incoming sine wave actually is. In fact, even though the data set appears to represent a sine wave of 0.2Fs, the actual signal could be any sine wave having a frequency of (n)Fs ± 0.2Fs, where n is any integer, starting with 0. So the original signal could be 0.2Fs, 0.8Fs, 1.2Fs, 1.8Fs, 2.2Fs, etc. (or even more than one of those). We say that 0.2Fs is the alias of a signal that may actually be at another frequency entirely. During interpretation of sampled data, it is customary to treat signals as though they occurred in the baseband (0–0.5Fs), whether or not that is the case. In general, in a system sampling at Fs, a signal at a frequency F will alias into the baseband at

Basic Signal Processing Operations

View Chapter

Purchase Book

Published in Nassir H. Sabah, Electric Circuits and Signals, 2017

Nassir H. Sabah

The inverse process of convolution is deconvolution, and has the potential of undoing undesirable distortion introduced by a given system, if this distortion can be accounted for with reasonable accuracy. The difficulties of implementing deconvolution and its limitations are discussed. Another important signal processing operation is sampling of a continuous signal. A relevant question is: How often must a signal be sampled without losing any information in the signal? The answer would allow faithful reconstruction of the signal from the sampled values. It turns out that undersampling at a rate less than a critical value introduces an insidious form of distortion, known as aliasing, which cannot be removed.

Cross-project defect prediction using data sampling for class imbalance learning: an empirical study

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2021

Lipika Goel, Mayank Sharma, Sunil Kumar Khatri, D. Damodaran

For the above-stated research questions, we have conducted an empirical study on 12 publicly available datasets. The oversampling and undersampling techniques of data sampling have been studied and have done a comparative analysis with CPDP models without using any data sampling technique. SMOTE [23] is used in this paper for oversampling and RUS [24] for undersampling techniques. These techniques will eliminate the problem of class imbalance. CK metrics [25] from the object-oriented software projects have been extracted and used in defect prediction. We have tried to validate the use of CK metrics in CPDP. In our previous work [24] of binary classification for CPDP and WPDP, we concluded that Random forest and Gradient Boosting ensemble algorithms outperformed all the other algorithms (LR, NVB, K-NN) used in the study. Therefore in this experiment, we have selected Ensemble learning models (Random forest, Gradient Boosting) as the CCDP models. The performance evaluation measures used are F-Measure, G-Measure and AUC. Experimental results over 12 publicly available datasets show that the oversampling technique achieves the optimal results.

Happiness Index Determination by Analyzing Satellite Images for Urbanization

View Article

Journal Information

Published in Applied Artificial Intelligence, 2021

Yasir Afaq, Ankush Manocha

It has been observed that the accuracy of each model is improved while decreasing the number of classes. From the calculated outcome, it can be concluded that the MLP model achieved a higher accuracy of 65.25%. However, the achieved results are not satisfactory due to class imbalance in the dataset. To solve the issue of class imbalance, two major approaches are opted such as over-sampling and under-sampling. The over-sampling techniques such as SMOTE, SMOTE based on SVM, SMOTE based on borderline, RandomOverSampler, and ADASYN, and the under-sampling approaches such as TomekLinks and TomekLinks are employed. Furthermore, the calculated outcomes with respect to over-sampling and undersampling are shown below.

Predicting additive manufacturing defects with robust feature selection for imbalanced data

View Article

Journal Information

Published in IISE Transactions, 2023

Ethan Houser, Sara Shashaani, Ola Harrysson, Yongseok Jeon

Controlling the balance in the data that trains prediction models with sampling techniques can reduce the amount of bias in the majority class. Sampling generally falls into two categories: (i) oversampling – duplicating, synthesizing, or augmenting data points in the minority class using methods such as Random Over-Sampling (ROS), Synthetic Minority Oversampling Technique (SMOTE), and Absent Data Generator (ADG) (Blagus and Lusa, 2013; Pourhabib et al., 2015; Wang and Ni, 2019); and (ii) undersampling – “removing observations from the majority class” using methods such as Random Under-Sampling (RUS) and Cluster Centroid Under-Sampling (CCUS) (Wang and Ni, 2019).