Data reduction – Knowledge and References

Explore chapters and articles related to this topic

Selection and Design of the Experiment

Published in Marian (Editor-in-Chief) Muste, Dennis A. Lyn, David M. Admiraal, Robert Ettema, Vladimir Nikora, Marcelo H. Garcia, Experimental Hydraulics: Methods, Instrumentation, Data Processing and Management, 2017

Marian (Editor-in-Chief) Muste, Dennis A. Lyn, David M. Admiraal, Robert Ettema, Vladimir Nikora, Marcelo H. Garcia

Data-processing is an eventual step in the experimental process. It includes filtering and proofing of data, and involves eliminating erroneous data. Data-reduction entails reorganizing data into usable forms, then calculating values of meaningful variables or parameters. Reduced data may then be analyzed to evaluate initial hypotheses, compare different models, or meet other purposes. Rigorous data analysis can be a difficult part of an experiment, because it requires insight into the correct theoretical relationships between parameters. Uncertainty analysis shows how errors introduced by instruments and methods may propagate through to the final results provided by an experiment. Such analysis is important for assessing the significance of the results from an experiment and revealing means to further improving the experimental process. An experiment is fully useful when accurately reported. Therefore, experiments require careful recording and communication of the activities described here. Throughout all phases it is important to review and cite relevant literature. Published literature, especially on similar prior experiments, can help guide the design and the performance of an experiment and can provide useful context for evaluating the results it produces.

Principle Component Analysis

View Chapter

Purchase Book

Published in Jason M. Kinser, Image Operators, 2018

Jason M. Kinser

Data generated from experiments may contain several dimensions and be quite complicated. However, the dimensionality of the data may far exceed the complexity of the data. A reduction in dimensionality often allows simpler algorithms to effectively analyze the data. A common method of data reduction is principal component analysis (PCA).

The evolution of recommender systems: From the beginning to the Big Data era

View Chapter

Purchase Book

Published in Matthias Dehmer, Frank Emmert-Streib, Frontiers in Data Science, 2017

Beatrice Paoli, Monika Laner, Beat Tödtli, Jouri Semenov

Due to the diversity of sources, the collected dataset quality may vary in terms of noise, redundancy, consistency, and so on. Prior to data analysis, data must be prepared. Therefore, data-preprocessing techniques including data cleaning, data integration, data reduction, and data transformation should be applied to remove noise and inconsistencies. Each subprocess faces a different challenge with respect to data-driven applications. The most relevant types of data preprocessing are as follows: Data cleaning: Data cleaning identifies and corrects errors and removes noise. The key issue is to maintain relevant data while discarding unimportant data. Data cleaning identifies inaccurate or incomplete data and repairs or deletes data to improve quality [43,44]. Common techniques of data cleaning are filling in missing values, smoothing noisy data, identifying or removing outliers, and resolving inconsistencies. Methods used in data cleaning are statistical, clustering, pattern-based, and parsing methods, association rules and outliers identification [45,46].Data integration: Data integration techniques merge data from different sources, provide a unified view of the data and a coherent storage, and detect and resolve data value conflicts.Data reduction: Data reduction reduces the data volume by aggregating and eliminating redundancies and generates a representative and much smaller dataset that produces (nearly) the same analytical results as the original raw dataset. Commonly used techniques are data compression, clustering, sampling, dimension reduction, heuristic methods, regression, feature selection, and feature discretization [47].Data transformation: Data transformation can construct or aggregate new attributes and new features. Typical data transformation techniques are smoothing for removing noise from data, aggregation to summarize data, normalization techniques such as min–max normalization, and z-score [47].

Big Data Classification Using Enhanced Dynamic KPCA and Convolutional Multi-Layer Bi-LSTM Network

View Article

Journal Information

Published in IETE Journal of Research, 2023

Gnanendra Kotikam, S. Lokesh

Following the collection of raw data, the data are essentially delivered in the appropriate structure, amount, and format for all data analytic tasks. Data pre-processing is used to meet this need, and it consists primarily of four tasks: data cleansing, data transformation, data reduction, and data partitioning. By removing duplicate and/or irrelevant observations and/or outliers, and filling in missing values, data cleaning tries to improve data quality. When specialized modeling methods demand a specific data attribute, for example, categorical and numerical or data scale, data transformation is performed. The goal of data reduction is to find the utmost important factors or variables in modeling, minimize dataset size, and increase calculation efficiency. The goal of data partitioning is to break down a huge dataset into numerous smaller sets or groups that may be evaluated independently to increase the model's sensitivity and robustness. Preprocessing of the data is also done using weights allocated to features based on size, content, importance relevance, and keywords. Here, the automated weight assignment technique is used.

Big Data Analytics in Cyber Security: Network Traffic and Attacks

View Article

Journal Information

Published in Journal of Computer Information Systems, 2021

Lidong Wang, Randy Jones

The main purpose of the analysis of network traffic and duplicates is to seek approaches to data reduction that is especially important for big data with a huge volume. Generally, data reduction can be fulfilled through removing redundant variables (or attributes), clustering, eliminating duplicate instances, etc. Clustering analysis of network traffic is useful in intrusion detection because malicious activity often separate itself from normal activity. Clustering also helps discover the trends of data and activities. If a variable can be “derived” from another variable or a set of other variables, it is redundant. Duplicate instances in a dataset have a negative effect on the training process of machine learning. Removing redundant variables and duplicate instances help improve the efficiency and accuracy of machine learning and data mining. Many redundancies can be detected by correlation analysis. A strong correlation between two variables indicates that they have much overlapping information and one of them can be removed. Principal components analysis (PCA) has been used in the dimensionality reduction of a dataset and feature extraction from the variables of high dimension datasets, which is especially effective on datasets with redundant variables (strongly correlated). The correlation analysis of a dataset could be performed to decide whether a PCA should be conducted for the removal of redundant variables and the dimensionality reduction of the dataset.

Discrete wavelet transform-based freezing of gait detection in Parkinson’s disease

View Article

Journal Information

Published in Journal of Experimental & Theoretical Artificial Intelligence, 2021

Amira El-Attar, Amira S. Ashour, Nilanjan Dey, Hatem Abdelkader, Mostafa M. Abd El-Naby, R. Simon Sherratt

For assisting the PD patients, Bächlin et al., (2010) implemented a wearable device to monitor the FoG symptom. An analysis of the inherent frequency components in the movements has been used to automatically detect the FoG. The leg’s motion has been sampled at 64 Hz using a rectangular window function of 4-second window length and 0.5-second steps. A 256 FFT points have been calculated in order to determine the PSD. The results of FoG detection established sensitivity and specificity of 73.1% and 81.6% values, respectively. Mazilu et al., (2013) compared the performance of using several feature extraction approaches based on statistical/time-domain features for the detection/prediction of the FoG cases. Rezvanian and Lockhart (2016) identified the FoG using continuous wavelet transform (CWT). The authors proposed an index from the CWT components, which discriminated the FoG in anterior posterior axis. A 2-second window size provided specificity and sensitivity of values 77.1% and 82.1%, respectively. Accordingly, the FoG detection techniques can be categorised into two main methods, namely, FoG detection based on the freezing index with threshold algorithms and machine learning-based FoG detection techniques. A wearable assistant has been proposed in practice to monitor people with PD for FoG detection (Jha, 2016). There are many data reduction techniques, including dimensionality reduction and data compression (Han, Pei, & Kamber, 2011). Furthermore, DWT has been applied as a type of data reduction.