Data preparation – Knowledge and References

Explore chapters and articles related to this topic

Data Acquisition and Intelligent Diagnosis

Published in Diego Galar, Uday Kumar, Dammika Seneviratne, Robots, Drones, UAVs and UGVs for Operation and Maintenance, 2020

Diego Galar, Uday Kumar, Dammika Seneviratne

Zhang, Zhang, and Yang (2003) argue for the importance of data preparation for three reasons: real-world data are impure; high-performance mining systems require quality data; and quality data yield high-quality patterns. Real data can disguise useful patterns if data are incomplete, noisy, and inconsistent. Data preparation generates new datasets that are smaller than the original ones by selecting relevant data, recovering incomplete data, purifying data, reducing data, and resolving data conflicts (Diaz, Herrera, Izquierdo, & Pérez-Garcia, 2010).

Instigation and Development of Data Science

View Chapter

Purchase Book

Published in Pallavi Vijay Chavan, Parikshit N Mahalle, Ramchandra Mangrulkar, Idongesit Williams, Data Science, 2022

Priyali Sakhare, Pallavi Vijay Chavan, Pournima Kulkarni, Ashwini Sarode

Data cleansing is a sub-process of the data preparation phase in the process of data science which focuses on removing the errors from the data that becomes a consistent and true representation of the process from which it originated. Errors are generated in two types: interpretation error and inconsistency error. Interpretation error occurs when we take the value of data for granted, whereas inconsistency error occurs when there are many inconsistencies between the data sources or against the company’s standard value.

Big Data Optimization in Electric Power Systems: A Review

View Chapter

Purchase Book

Published in Ahmed F. Zobaa, Trevor J. Bihl, Big Data Analytics in Future Power Systems, 2018

Iman Rahimi, Abdollah Ahmadi, Ahmed F. Zobaa, Ali Emrouznejad, Shady H.E. Abdel Aleem

To support a big data-based project, one first needs to analyze the data. There are specific data management tools for storing and analyzing large-scale data. Even in a simple project, there are several steps that must be performed. Figure 4.1 shows these steps that include data preparation, analysis, validation, collaboration, reporting, and access. They are briefed as follows: Data preparation is the process of collecting, cleaning, and consolidating data into one file or data table to be used in the analysis.Data analysis is the process of inspecting, cleansing, transforming, and modeling data to discover the useful information, draw conclusions, and support decision-making.Data validation is the process of ensuring that data have undergone a kind of cleansing to ensure they have acceptable quality and are correct and useful.Data collaboration means data visualization from all available different data sources while getting the data from the right people, in the right format, to be used in making effective decisions.Data reporting is the process of collecting and submitting data to authorities augmented with statistics.Data access typically refers to software and activities related to store, retrieve, or act on data housed in a database or other repository.

Forecasting Uncertainty Parameters of Virtual Power Plants Using Decision Tree Algorithm

View Article

Journal Information

Published in Electric Power Components and Systems, 2023

Raji Krishna, Hemamalini Sathish, Ning Zhou

The data preparation is an essential task in DT algorithm modeling, which includes data understanding, modeling, preprocessing, testing, and deployment. For predictive modeling, the initial data are translated into the DT algorithm. Data preparation for analysis involves several steps, such as merging data from different sources, identifying outliers, detecting missing values, recognizing data that is not in the correct format, and normalizing the data. Once these steps have been completed, the data will be in a suitable format for analysis. Out of time stamps are created for additional elements, such as the year, month, day, and hour. The dataset is divided into two parts, with 80% (7008) of the data allocated for training purposes and the remaining 20% (1752) of the dataset designated for validation and testing. To estimate the uncertainty parameters, the proposed methodology is implemented on a computer with the following specifications: an 8.00 GB RAM, 1.99 GHz Intel (R) Core (TM) i7-8550U processor, and MATLAB 2022a app toolbox.

Enhancing resilience in marine propulsion systems by adopting machine learning technology for predicting failures and prioritising maintenance activities

View Article

Journal Information

Published in Journal of Marine Engineering & Technology, 2023

Mohsen Elmdoost-gashti, Mahmood Shafiee, Ali Bozorgi-Amiri

Data preparation is the process of transforming raw data so that data scientists and analysts can run it through ML algorithms and uncover insights or make predictions. In this step, unnecessary features are removed, the predictive tags are determined, and eventually, the titles of the features are added to the dataset and stored in conventional formats such as CSV or Excel files. Based on the first step of the method, the dataset from the UCI repository was tested to evaluate the performance of the proposed methods. This dataset entails a set of criteria including sixteen features and two predictive labels (classes) with titles of compressor decay state coefficient (CDSC) and turbine decay state coefficient (TDSC) which indirectly indicate the state of the system that has achieved performance impairment and stored in the dataset over the parameter’s space. In particular, the CDSC has been investigated in the domain [1; 0.95] and the TDSC in the domain [1; 0.975]. Table 1 presents the features of the dataset.

WEL-ODKC: weighted extreme learning optimal diagonal-kernels convolution model for accurate classification of skin lesions

View Article

Journal Information

Published in The Imaging Science Journal, 2023

V. Auxilia Osvin Nancy, P. Prabhavathy, Meenakshi S. Arya

Data preparation includes data preprocessing, which is any sort of processing performed on raw data to prepare it for another data processing approach. Pre-processing an image is an important aspect of detection since it enhances the quality of the original image by eliminating noise. It was required to employ it in order to narrow the search for abnormalities in the background components impacting the outcome. The primary purpose of this step is to improve image quality by eliminating unnecessary and unconnected background objects prior to further processing. To eliminate the noise, we employ a median filter for denoising the salt and pepper noise added to the input image. The RGB source images were initially converted to grey to eliminate hair [31]. The black hair outlines were then found in the grayscale images using the blackHat filter. The disparity between the morphological closure process and the source images are shown by the blackHat image. The contours were then used to create the mask. The mask was reduced to simply covering the hair region, and the non-zero pixels in the source image were eliminated using an image-painting approach. The images are almost hair-free as a result of this technique. It also removes some information from the images, but the end result is relatively superior. The preprocessing operations applied to an input image are presented in Figure 3.