Data exploration – Knowledge and References

Explore chapters and articles related to this topic

Three-Dimensional Visualization

Published in Diego Galar, Uday Kumar, Dammika Seneviratne, Robots, Drones, UAVs and UGVs for Operation and Maintenance, 2020

Diego Galar, Uday Kumar, Dammika Seneviratne

Visual data exploration can be seen as a hypothesis generation process; the visualizations of the data allow the user to gain insight into the data and come up with new hypotheses. The verification of the hypotheses can be done via data visualization but may also be accomplished by automatic techniques from statistics, pattern recognition, or machine learning. In addition to the direct involvement of the user, the main advantages of visual data exploration over automatic data analysis techniques are the following: Visual data exploration can easily deal with highly nonhomogeneous and noisy data.Visual data exploration is intuitive and requires no understanding of complex mathematical or statistical algorithms or parameters.Visualization can provide a qualitative overview of the data, allowing data phenomena to be isolated for further quantitative analysis.

Data Mining – Unsupervised Learning

View Chapter

Purchase Book

Published in Rakesh M. Verma, David J. Marchette, Cybersecurity Analytics, 2019

Rakesh M. Verma, David J. Marchette

Data exploration is the process of getting to know the data and assessing its quality. The first step in exploration is to generate a data quality report [231]. It should include separate tables or plots for continuous and categorical features. The report should include the time period of data collection for each subset of the data set and the sources. Often, the classification process becomes easier if subsets of the data set are gathered at different times. For instance, in phishing email classification, if legitimate emails are gathered from recent years, say less than five years from now, and phishing emails are gathered from more than a decade ago, then a classifier can pick up on this artifact of data collection and get a very high accuracy.

Recent Trends of IoT and Big Data in Research Problem-Solving

View Chapter

Purchase Book

Published in Shivani Agarwal, Sandhya Makkar, Duc-Tan Tran, Privacy Vulnerabilities and Data Security Challenges in the IoT, 2020

Pham Thi Viet Huong, Tran Anh Vu

Data exploration is the first step in data analysis, which analyzes the main features of a data set, such as the size, accuracy, missing values, and initial information in the data. With the advances in technologies and the internet, a vast amount of data has become easily accessible for use in decision making. Data exploration, which is a time-consuming task, is the initial process of creating useful information from the insights. The continuous and rapid change in the type and variety of data requires new data exploration and analytics. In recent years, researchers have concentrated on creating new ways of exploring Big Data or have tried to modify the existing exploration techniques to make them fit the current trend of IoT.

Current applications and future impact of machine learning in emerging contaminants: A review

View Article

Journal Information

Published in Critical Reviews in Environmental Science and Technology, 2023

Lang Lei, Ruirui Pang, Zhibang Han, Dong Wu, Bing Xie, Yinglong Su

Figure 3 illustrates the process-oriented workflow of ML, which involves data collection, exploration, preprocessing, modeling, validation, interpretation, and applicability domain. Data collection is the crucial and fundamental step for any algorithm, and improved data quality leads to better performance. Data exploration is an interpretive analysis of the sample data, aiming to describe its morphological characteristics and explain the relevance of the data. Data exploration and preprocessing are closely linked. Preprocessing includes handling missing values, attribute coding, data normalization and standardization, feature selection, and principal component analysis, and so on. Preprocessing accounts for 40–70% of the entire process, and by transforming raw data into analytically appropriate forms, preprocessing can improve data quality and make it adaptable to specific algorithms.

ML-EHSAPP: a prototype for machine learning-based earthquake hazard safety assessment of structures by using a smartphone app

View Article

Journal Information

Published in European Journal of Environmental and Civil Engineering, 2022

Ehsan Harirchian, Kirti Jadhav, Vandana Kumari, Tom Lahmer

Data exploration and visualization is the initial step in any data analysis, prediction, and organization tasks, which involves summarizing the main characteristics of a data set, counting its size, dates, initial patterns, and other characteristics to understand what is in a data set and obtain the characteristics of its data. By observing data graphically, one can see whether two or more variables are correlated and determine if they are good candidates for other analyses, including univariate, bivariate, or multivariate analysis. This step also deals with missing values and data cleaning. In this study, both earthquakes’ data has been visualized, and their correlations to each other have been investigated. Data pre-processing is a crucial step to prepare the data for different purposes and improving the quality of the raw experimental data (Bedia et al., 2018). There are many essential steps in data pre-processing, such as data cleaning, data transformation, and feature selection (Chakrabarty et al., 2015). Here, the data preparation step only includes standardization as these datasets do not include any categorical data.

Electromyography pattern-recognition based prosthetic limb control using various machine learning techniques

View Article

Journal Information

Published in Journal of Medical Engineering & Technology, 2022

Sushil Ghildiyal, Geetha Mani, Ruban Nersisson

The raw data is imported using Python libraries to formulate the dataset, and some data exploration is done. Data exploration is the first and foremost phase in data analysis involving the summary of the key characteristics of the prepared dataset such as initial patterns in the data, size, statistical parameters and other features. The cleaning part of the data is also performed by the removal of outliers and the NaN values from the dataset. NaN values are nothing but missing values. Generally, the fitting of a model is reduced, or the model becomes biased when there are missing values in the training dataset because the behaviour and relationship with other variables are not analysed. Therefore, the wrong prediction or classification may be the result. There are different methods to deal with NaN values. One is Deletion, and the other is Mean/Mode/Median computation. In this work, the deletion (listwise deletion) technique is used to remove NaN or missing values from the dataset.