Noisy data – Knowledge and References

Explore chapters and articles related to this topic

Data Quality Management for Pervasive Health Monitoring in Body Sensor Networks

Published in Jacques Bou Abdo, Jacques Demerjian, Abdallah Makhoul, 5G Impact on Biomedical Engineering, 2022

Sensor precision: The data captured by a sensor can be influenced by random noise, causing them to deviate slightly from the true values. The precision is a measure of random noise [16]. It shows how close the data values are to each other. Noisy data is usually caused by fluctuations and interferences in the environment, low sensor battery, sensor hardware failure, sensor improper calibration, etc. [17]. Noises often exist in data and are difficult to control. Because of the noise, the data will be scattered around the true values. Random noise impacts the variability of the data without affecting their average. However, analyzing noisy data could have a negative impact on the data analysis results. Thus, it is necessary to detect and remove them to extract the relevant information from the data [18, 19].

Fuzzy C-Mean and Density-Based Spatial Clustering for Internet of Things Data Processing

View Chapter

Purchase Book

Published in Aboul Ella Hassanien, Nilanjan Dey, Surekha Borra, Medical Big Data and Internet of Medical Things, 2018

Heba El-Zeheiry, Mohammed Elmogy, Nagwa Elaraby, Sherif Barakat

The procedure for cleaning data is not easy. The confusion may reach more than 30% of real data, which could be questionnable. In addition, it has exceptional cost. Data can be cleaned based on procedures, such as filling in missing values, smoothing the noisy data or solving the inconsistencies in the data. Numerous methods have been utilized to deal with omitted data, such as: Removing: It eliminates the omitted data and consumes the rest of the data in the analysis. This deletion can be inefficient as it decreases the data set size and may delete valuable dataReplacement: It fills in the omitted values with the support of some methods, such as: Mean or Mode: Fills the missing data by replacing it with the mean of a numeric attribute, or mode for a nominal attribute of all dataKNN Imputation: It deploys KNN algorithms to fill in the missing data. It can treat it with discrete and continuous attributes. KNN searches all the data to discover the highest related instances. It can select a possible value from the data set (Figure 7.2)

Analysis of Unimodal and Multimodal Biometric System

View Chapter

Purchase Book

Published in Chiranji Lal Chowdhary, Intelligent Systems, 2019

Chiranji Lal Chowdhary

The detected information may be noisy or twisted. A fingerprint with a scar or a voice modified by cold is a case of noisy data. Noisy data could likewise be the aftereffect of defective or despicably kept up sensors (e.g., the collection of dirt on a fingerprint sensor) or unfavorable ambient conditions (e.g., a poor light of a client’s face in a face recognition system). Noisy biometric information might be erroneously matched with templates in the database resulting in a client being incorrectly dismissed.

Using Skeleton Correction to Improve Flash Lidar-based Gait Recognition

View Article

Journal Information

Published in Applied Artificial Intelligence, 2022

Nasrin Sadeghzadehyazdi, Tamal Batabyal, Alexander Glandon, Nibir Dhar, Babajide Familoni, Khan Iftekharuddin, Scott T. Acton

To alleviate the effect of noisy data, a common approach involves the removal of outlier noisy data that are generated as a result of faulty measurements. Further processing is applied to the remaining higher-quality data. In Chi, Wang, and Q-H Meng (2018) and Choi et al. (2019), the authors used weighting schemes to reduce the effect of noisy and low-quality skeletons. However, they did not address the cases where the whole skeleton is missing. The discussed methods are effective. But, they usually take advantage of high-quality data collected by Mocap or Kinect, which are generally limited to controlled environments. The previously mentioned limitations call for depth-based modalities such as flash lidar that is applicable in real-world scenarios. Using flash lidar will raise new problems in gait recognition. In turn, these problems provide an opportunity for developing novel methods to improve gait recognition.

Investigating photovoltaic solar power output forecasting using machine learning algorithms

View Article

Journal Information

Published in Engineering Applications of Computational Fluid Mechanics, 2022

Yusuf Essam, Ali Najah Ahmed, Rohaini Ramli, Kwok-Wing Chau, Muhammad Shazril Idris Ibrahim, Mohsen Sherif, Ahmed Sefelnasr, Ahmed El-Shafie

As PV solar power output is heavily impacted by weather and meteorological factors including solar irradiance, humidity, and temperature, the data series is typically very noisy. Noisy data affects the performance of an ML model as it reduces the model’s ability to perform generalization (Meng & Song, 2020). RFs are suitable for implementation in PV solar power output forecasting due to its characteristics, namely random feature selection, bootstrap sampling, out-of-bag error estimation, and full depth decision tree growing (Meng & Song, 2020). RFs reduce noise in data by utilizing the random feature selection and bootstrap sampling (Meng & Song, 2020). RF algorithms first extract a few samples using the bootstrap sampling after inputting data samples, then selects the features of these samples randomly, hence enabling them to handle noisy data better (Meng & Song, 2020). One of the primary drawbacks of the RF algorithm is the complexity in obtaining an interpretation of causal links between predictors and outputs due to the utilization of multiple decision trees within the RF algorithm, hence this algorithm is useful in situations where the need for high prediction accuracy is prioritized over the need for interpretation (Aria et al., 2021; Wongvibulsin et al., 2019).

Resampling-based noise correction for crowdsourcing

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2021

Wenqiang Xu, Liangxiao Jiang, Chaoqun Li

This paper focuses on label noise correction for crowdsourcing learning. The core idea of label noise correction is the identification of noise in integrated labels and the correction of the noise. The step of identifying noise is usually achieved through noise filters, which generates two data sets: a clean data set and a noisy data set. Existing noise correction methods generally build classifiers based on either the clean data set or the unfiltered original data set to correct noisy labels (Nicholson, Sheng et al., 2015; Nicholson et al., 2016; Nicholson, Zhang et al., 2015; Triguero et al., 2014; J. Zhang, Sheng et al., 2018; J. Zhang, Sheng, Wu et al., 2015). Our method is based on an observation that the clean data set still contains some noisy instances and the noisy data set contains some clean instances. To better utilise potential information in the data sets, our method repeatedly resamples the clean set and the noisy set several times according to a certain proportion. Multiple classifiers are then built based on the resampled data sets, and the most likely label is then ascribed to every instance of these two sets. Thus, the major innovations and contributions of this study can be summarised as follows: 1) Utilise both the clean set and the noisy set to correct noisy instances. 2) Apply the idea of random resampling for noise correction and propose a resampling-based noise correction method (simply RNC). 3) Experiments on both simulated data and real-world data demonstrate the effectiveness of the label noise correction methods.