Data transformation – Knowledge and References

Explore chapters and articles related to this topic

Implementation of the Quality-of-Service Framework Using IoT-Fog Computing for Smart Healthcare

Published in Monika Mangla, Ashok Kumar, Vaishali Mehta, Megha Bhushan, Sachi Nandan Mohanty, Real-Life Applications of the Internet of Things, 2022

Prabhdeep Singh, Rajbir Kaur

Data transformation plays an important role in generating the data in the best suitable shape for data analysis. Data mining has a bag full of transformation techniques like smoothing, aggregation, generalization, normalization, and attribute construction. The basic purpose of all these methods is to transform the given data in the most appropriate form, by either reducing the size of data without distressing its dignity and meaningfulness or by adding new attributes. As in aggregation, summary operations are applied to the data, to consolidate the given set into the precise and more effective form. Generalization is another incredible technique having its root from the concept hierarchy [30]. The numerical attributes, which have a large set of different values, can be transformed to lie between a smaller specified range. In attribute construction, new attributes are constructed to enhance the analysis performance. Thus, data transformation enhances the system’s performance by increasing the analysis rate, reducing the data size, and providing the data in the most required format.

Exploratory Data Analysis and Data Visualization

View Chapter

Purchase Book

Published in Chong Ho Alex Yu, Data Mining and Exploration, 2022

Chong Ho Alex Yu

2) Re-expression (data transformation): When the distribution is skewed or the data structure obscures the pattern, the data can be rescaled in order to improve interpretability. Typical examples of data transformation include: using log transformation or inverse probability transformation to normalize a distribution, using square root transformation to stabilize variances, and using logarithmic transformation to linearize a trend.

Instigation and Development of Data Science

View Chapter

Purchase Book

Published in Pallavi Vijay Chavan, Parikshit N Mahalle, Ramchandra Mangrulkar, Idongesit Williams, Data Science, 2022

Priyali Sakhare, Pallavi Vijay Chavan, Pournima Kulkarni, Ashwini Sarode

Some models require the data to be in different shapes. Now, as we have well-cleaned and integrated the data, this is the next task that we have to perform. Data transformation helps us in transforming the data so that it can take a suitable form for modeling the data.

Current and future role of data fusion and machine learning in infrastructure health monitoring

View Article

Journal Information

Published in Structure and Infrastructure Engineering, 2023

Hao Wang, Giorgio Barone, Alister Smith

Data transformation is the process to standardise and/or regularise raw data to improve comparability between features and fulfil pre-requirements for various algorithms. Scaling methods such as standardisation and min-max scaling map the raw data against a predefined range which increases the comparability between features. As many of the algorithms measure distances (e.g. instance-based algorithms, clustering, etc.), which are sensitive to the magnitude of data and the range across the features, preliminary scaling is often essential.

A Systematic Overview of Android Malware Detection

View Article

Journal Information

Published in Applied Artificial Intelligence, 2022

Li Meijin, Fang Zhiyang, Wang Junfeng, Cheng Luyu, Zeng Qi, Yang Tao, Wu Yinwei, Geng Jiaxuan

(4) Data transformation. Data transformation is to transform data from one form to another. The most commonly used research method is converting the extracted features into images and feeding them into the deep neural network.

Grey wolf optimizer-based machine learning algorithm to predict electric vehicle charging duration time

View Article

Journal Information

Published in Transportation Letters, 2022

Irfan Ullah, Kai Liu, Toshiyuki Yamamoto, Md Shafiullah, Arshad Jamal

Real-world data are usually noisy, incomplete, and inconsistent. Before any analysis, data need preprocessing, cleaning, integration, reduction, and transformation. Data cleaning seeks to address inconsistencies in the data by filling in missing values, smoothing out noise while identifying outliers, and correcting discrepancies in the data. In the data cleaning process, the missing values can be manually filled. If the data are normally distributed, the attribute’s mean value can replace the missing value, whereas if the data are nonnormal, the median value can be used. Generally, the term ‘noisy’ refers to a random inaccuracy or the presence of unnecessary data points. Binning is a method for handling noisy data. The data are first sorted, and then the sorted values are split and stored in bins. In data integration, the data combine from numerous sources. Data integration is an essential process for data analysis. Careful data integration can reduce and eliminate inconsistencies and redundancies. By using correlation analysis, some redundancies can be detected. The data reduction procedure helps reduce the data volume, making the analysis easier while producing the same results. Data compression, numerosity, and dimensionality reduction are some of the approaches used for data reduction. Numerosity and dimensionality reduction techniques are also considered for data compression. Numerosity reduces the size of the data representation by lowering the volume. There is no data loss in this reduction. These methods can be either parametric or nonparametric. Log-linear and regression are examples of parametric methods, and histograms, clustering, and sampling are nonparametric methods. In dimensionality reduction, principal components analyses and wavelet transforms were used to transform the original data into a smaller space. Data transformation refers to changing the format or structure of data. Based on the requirements, this process can be easy or difficult. Smoothing, aggregation, discretization, and normalization are the few methods for data transformation. For instance, in normalization, the scales attribute data to a small range, such as 0.0 to 1.0. The development of concept hierarchies and data discretization are two more examples.