Explore chapters and articles related to this topic
Data Collection and Analysis
Published in James William Martin, Lean Six Sigma for the Office, 2021
The types of data being created, analyzed, and stored have also become diverse. In addition to structured data in the form of parsed data fields such as Excel that are also mostly numeric, there is also unstructured data. This data type is in the form of texts, sounds, pictures, and other unstructured formats. Statistical modeling is now used to incorporate these diverse data types into predictive models using new algorithms. But the data requires conditioning to make it suitable for analysis. Examples include data mining social media to measure customer sentiment and building predictive models for what they may purchase in the future or any process issues in the text.
Investigating surface condition classification of flexible road pavement using data mining techniques
Published in International Journal of Pavement Engineering, 2022
A. T. Olowosulu, J. M. Kaura, A. A. Murana, P. T. Adeke
Data mining is a mathematical technique used for examining hidden relationship between variables in a dataset (Smadi, 2000; Miradi, 2009). It is a machine learning approach which uses non-generative, black-box or exploratory models to examine (big) dataset for system classification and performance prediction (Fox, 2018). Examples of data mining methods include; Rough Set theory, ANN, Decision Tree (DT), Support Vector Machine, Bayesian models, etc. (Gopalakrishnan et al., 2009; Miradi, 2009: Bal, 2013). These methods are used for quantisation and mapping of uncertainties or vague analytical problems (Munakata, 2008; Miradi, 2009; Arabani et al., 2017). They are used for the analysis of imprecise, uncertain or incomplete dataset and knowledge based elements with associated attributes to discover patterns and relationship between variables (Pawlak, 1982, 1997, 2002; Miradi, 2009; Gopalakrishnan et al., 2009). It is also used for identifying partial or total dependencies in a given dataset, eliminates redundant data, give approach to null values, missing data, dynamic data, etc. Therefore, data mining techniques are suitable for sourcing core information in a relatively poor dataset generated from system behaviour such as pavement condition for the purpose of performance prediction using specific models. The format of dataset suitable for data mining techniques is usually presented in the form expressed in Equation (1); where the vector x represents an array of input or independent variables used for defining the target or dependent variable y. The target variable is usually a classified or predicted attribute from the dataset; the expression typically assumes a regression model. The use of data science and machine learning techniques have become the most effective and reliable methods for analysing data generated from systems behaviour for information and knowledge discovery purposes (Witten and Frank, 2005; Miradi, 2009; Saltan et al., 2011; Hutter et al., 2019).