Data wrangling – Knowledge and References

Explore chapters and articles related to this topic

The Cognitive IoT:

Published in Pethuru Raj, Anupama C. Raman, Harihara Subramanian, Cognitive Internet of Things, 2022

Pethuru Raj, Anupama C. Raman, Harihara Subramanian

Extract, transform, and Load (ETL): This is done by the business intelligence system (BI). It involves manipulation and processing of data from diverse sources in order to convert it into a single format and store it in a repository. The process of ETL involves the following key steps as depicted in the following Figure 3.6:Data ingestion: This involves transforming the collected data from the input source to the output source.Data Storage: This involves storing the collected data in a data store which has good storage capacity.Data Wrangling: This involves mapping of raw data into another format which helps in processing and analysis of output data.Business analytics: The business analytics will further process data to make it ready for data modelling. Business analytics helps managers to view and other desired outputs in the form of dashboards.In the next section, we will discuss the architecture and functionalities of some of the major IoT platforms which are available in the market today.

Software and Technology Standards as Tools

View Chapter

Purchase Book

Published in Jim Goodell, Janet Kolodner, Learning Engineering Toolkit, 2023

Jim Goodell, Andrew J. Hampton, Richard Tong, Sae Schatz

Data lakes are repositories of raw data. They’re places to store all different kinds of data—structured, semi-structured, and unstructured—in a single place “as is.” They’re specifically designed to store, process, and secure large amounts of diverse data.12 The data from a data lake, or from other unstructured or semi-structured repositories, will need to be transformed before it can be effectively analyzed. This process is typically called data wrangling (or data munging), and it includes steps like discovery, structuring, cleaning, and validating the data.13

Agile Project Management and Data Analytics

View Chapter

Purchase Book

Published in Seweryn Spalek, Data Analytics in Project Management, 2018

Deanne Larson

Data wrangling is an umbrella term used to describe the process of acquiring, transforming, cleansing, and integrating data into one “raw” form that enables the consumption of data (Jurney, 2017). The CRISP-DM stages of data understanding and data preparation require multiple iterations of data wrangling before the modeling stage can begin. Data wrangling also pushes up against constraints of the time-box. ASD focuses on the client, the technical team, and the business but less value is placed on the data. In data analytics, all of the value is derived from the data.

Artificial Intelligence with Python

View Article

Journal Information

Published in Technometrics, 2023

Aminatus Sa’adah

Part II, “Fundamentals of Artificial Intelligence”, comprises six chapters: Introduction to AI, Data Wrangling, Regression, Classification, Clustering, and Association Rules. Part II provides knowledge on creating an efficient ETL pipeline for usable data. First, readers are taught to preprocess data, including missing data values, duplicates, mapping values, outliers, permutations, joins and joins, reformats, and pivots. Then readers will learn about several machine learning models, supervised learning (regression and classification), and unsupervised learning (clustering) using the Sklearn and Keras packages. Some models studied include linear regression, logistic regression, decision trees, random forests, neural networks, Support Vector Machine (SVM), Naïve Bayes, and the K-means algorithm. Not only the program code, but there is also a brief explanation of each of these machine learning algorithms. Part II is completed with an introduction to the general concept of association rule mining. In Chapter 9 “Data Wrangling,” it is stated that data wrangling is an essential part of the data science role and most data scientists spend their time doing data wrangling. However, it is unfortunate that these data-wrangling techniques are not explained in detail. There is only a brief explanation of the program code. On the other hand, an exciting subsection in Chapter 10 “Regression” discusses how to improve our regression model to improve the results of the regression model obtained. It is essential to know, considering that regression modeling emphasizes the accuracy of the predictions obtained.

Impulsive Behavior Detection System Using Machine Learning and IoT

View Article

Journal Information

Published in IETE Journal of Research, 2021

Soumya Jyoti Raychaudhuri, Soumya Manjunath, Chithra Priya Srinivasan, N. Swathi, S. Sushma, Nitin Bhushan K. N., C. Narendra Babu

The data available from the sensors are in an encrypted format. Unless the data is decrypted into machine-readable format, the data is not usable. Therefore suitable data wrangling/munging techniques need to be applied to the raw sensor data to extract meaningful information from the raw data. In addition, few sensors give out analog data while the Raspberry pi processor requires a digital input for computation. Hence an Analog to Digital Converter (ADC) was utilized for converting the analog data from the sensors into digital format. The following mathematical formulation (5) represents the conversion strategy for the ADC: Data munging converts raw data into machine readable format. The munged data can then be passed on through subsequent layers for pre-processing for further analysis.

CRISP-eSNeP: Towards a data-driven knowledge discovery process for electronic social networks

View Article

Journal Information

Published in Journal of Decision Systems, 2019

Daniel Adomako Asamoah, Ramesh Sharda

There are multiple implications of this methodology to both big data projects on social media. Whereas access to big data has significantly increased for most organizations, how to design and develop sound theoretical questions that would generate deeper insights in the data is still lacking (Müller et al., 2016). The authors in (Abbasi, Sarker, & Chiang, 2016) identified that fast-paced trends in areas such as complex event processing and social network analytics creates opportunities for developing novel artifacts that can inject formalism into big data modeling. Our methodology contributes to the literature by offering a concise process for both practice and theoretical research as it supports the exploration of both conceptual and analytical questions. These questions then drive the course of the analysis while identifying valuable insights from data. Our CRISP-eSNeP process model is designed to support four broad facets of analytics research on big data generated from social networks. These are: 1) formulating appropriate questions based on domain knowledge; 2) data acquisition and management; 3) data wrangling and validation and 4) data analysis, knowledge generation and deployment.