Data cleansing – Knowledge and References

Explore chapters and articles related to this topic

Input Data Component

Published in Tanya Kolosova, Samuel Berestizhevsky, Supervised Machine Learning, 2020

Because input datasets are usually flawed, data cleansing is a very important step in the machine learning process. Data cleansing is the process used to identify inaccurate, incomplete, or improbable data and then to correct it when possible. To put it in a nutshell, data cleansing is a set of procedures that helps make your dataset more suitable for machine learning. To put it simply, the quality of training data determines the performance of the machine learning algorithms. Even if you think you have good-quality data, you can still run into problems with biases hidden within your training datasets. Thus, input data component is used to define data cleansing processes that include detection and treatment of outliers and bias corrections. Usually, data cleansing comprises of two stages: detection and correction, and we define these stages as metadata in the data dictionary for input data component.

Introduction to Machine Learning for Data Analytics

View Chapter

Purchase Book

Published in K. Hemachandran, Sayantan Khanra, Raul V. Rodriguez, Juan R. Jaramillo, Machine Learning for Business Analytics, 2023

L. K. Indumathi, Abdul Rais, Juvairia Begum

Data cleaning entails correcting grammatical and spelling errors, repairing faults like omitting codes and empty fields, finding duplicate data points, and standardizing data sets, among other things. It is seen as a basic aspect of the data science fundamentals and plays a significant role in creating trustworthy results and in the analysis procedure. Data cleaning services are designed to create uniform and standardized data sets that give data analytical devices and predictive analytics good accessibility to and perception of precise figures for every challenge.

Toward Data Integration in the Era of Big Data

View Chapter

Purchase Book

Published in Archana Patel, Narayan C. Debnath, Bharat Bhushan, Semantic Web Technologies, 2023

Houda EL Bouhissi, Archana Patel, Narayan C. Debnath

Step 2. Data cleansing: Data cleansing is the process of eliminating or changing data that is erroneous, incomplete, irrelevant, duplicated, or poorly formatted in order to prepare it for analysis. This data is often not necessary or beneficial for evaluating data, and it can slow down the management process or produce erroneous results.

Short term wind speed forecasting using time series techniques

View Article

Journal Information

Published in Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 2022

Shreya Sajid, Surender Reddy Salkuti, Praneetha c, Nisha k

Data Processing involves converting the raw data into a desirable format. The purpose of Data Cleaning is to eliminate unnecessary noise and irrelevant fields from the data in order to improve the quality of data. After a ground study, it was found that the selected dataset was already structured well and required little to no cleanup. The below steps were performed to ready the data for the analysis: The time and wind speed columns were extracted for each of the three subsets from the principal dataset.Missing value check: There were no null values present in the dataset.