Applying Data Mining in Smart Home
Bruno Bouchard in Smart Technologies in Healthcare, 2017
Data mining is the set of methods and algorithms allowing exploration and analysis of database (Ian H. Witten and Franck 2010). It exploits tools from statistics, artificial intelligence and database management system. Data mining is used to find patterns, association, rules or trends in datasets and usually to infer knowledge on the essential parts of the information (Quinlan and Ghosh 2006). It is often seen as a subtopic of machine learning. However, machine learning is typically supervised, since the goal is to simulate the learning of known properties from experience (training set) in an intelligent system. Therefore, a human expert usually guides the machine in the learning phase (Barlow 1989). Within realistic situations, it is often not the case. While the two are similar in many ways, generally, in data mining, the goal is to discover previously unknown knowledge (Chaudhuri et al. 2011) that can then be exploited in intelligent systems and business intelligence to arrive at better decisions.
Performance of Diverse Machine Learning Algorithms for Heart Disease Prognosis
Ayodeji Olalekan Salau, Shruti Jain, Meenakshi Sood in Computational Intelligence and Data Sciences, 2022
Health researchers have produced a vast collection of medical evidence that can be analyzed, and useful information can be extracted from it. Data mining techniques are methods for retrieving useful information from vast amounts of data [9]. Large networks of data in a medical database are discrete [10]. As a consequence, making decisions based on discrete data becomes a daunting challenge. Machine learning (ML), a subfield of data mining, excels at handling massive, well-formatted, normalized datasets. ML is a tool that can be used to diagnose, track, and forecast different diseases in the medical field [11]. The goal is to make the process easier and to deliver successful care to patients while avoiding serious repercussions [12]. The role of ML in detecting hidden discrete patterns and analyzing the data is critical. Following data processing and dimensionality reduction, ML methods aid in the early detection and speedy diagnosis of heart disease. This chapter aims at testing the efficacy and the potential of numerous ML and deep learning [13] techniques for predicting cardiac disease at an early level (Figure 1.1).
Clinical Data Analytics
Arvind Kumar Bansal, Javed Iqbal Khan, S. Kaisar Alam in Introduction to Computational Health Informatics, 2019
The data mining methods use clustering, associative rule formation, regression analysis, summarization and dependency modeling to study the correlation between the variables. The techniques are selected based upon the overall goal of data mining. For example, clustering is used to classify different classes of tumors. These classes are associated with different regimens. Regression analysis is used to correlate: 1) biomarker concentration with disease stages or 2) drug-dosage with the efficacy and/or toxicity. Summarization and associative rule mining are used to derive the association-rules. The popular algorithms used for data mining include K-means clustering, multivariate linear regression analysis, decision trees, support vector machines, neural network and Bayesian decision rules.
Advances with support vector machines for novel drug discovery
Published in Expert Opinion on Drug Discovery, 2019
Vinicius Gonçalves Maltarollo, Thales Kronenberger, Gabriel Zarzana Espinoza, Patricia Rufino Oliveira, Kathia Maria Honorio
Overall, data mining can be defined as the automatic extraction of useful information from large databases using algorithms in order to discover patterns and correlations within these data sets. It has been said that ‘More data has been created in the past two years than in the entire previous history of the human race’ [30], a process mostly driven by consumer-oriented data recording by companies, but with applications ranging over the most diverse fields. In the life sciences, the most benefited fields are cheminformatics, computational genomics and biomedical imaging [31]. Indeed, in the field of cheminformatics, the availability of new computational resources (primarily hardware, for example, use of graphical processing units – GPUs [32]) enabled the use of demanding algorithms, which can employ feed-forward networks such as deep learning (DL, as extensively reviewed in [33]) with multiple processing layers instead of the classical single-layer model. Cano demonstrated, using prediction of drug solubility as a model, that the use of architectures with GPU can accelerate SVM up to 15 times when compared to its sequential counterpart implementation version [34]. Despite the recent popularity of DL, classical ML techniques with a focus on SVM are still widely employed in drug discovery. Next, the main characteristics of SVM will be discussed along with exemplary medicinal chemistry studies that have applied this technique. Furthermore, some features of the SVM formulation will be presented.
A robust latent CUSUM chart for monitoring customer attrition
Published in Journal of Applied Statistics, 2023
Chunjie Wu, Zhijun Wang, Steven MacEachern, Jingjing Schneider
Constructing a prediction mechanism to monitor customer attrition [5,8,12,13] and analyze causal factors [14] has become a popular topic in the past few years. Oghojafor et al. [14], using stepwise logistic regression, examined the effect of socio-economic factors on customer attrition by testing the factors that caused the subscribers to leave one service provider for another. The development of data mining techniques has enhanced the ability to predict customer attrition. Coussement and Poel [5] compared three classification techniques to distinguish churners from non-churners, and concluded that the Random Forest is effective in improving the predictive performance. He et al. [8] discussed the commercial bank customer attrition prediction based on an SVM model improved by a random sampling method, and showed that it effectively enhanced the prediction accuracy. In addition, Qian et al. [17] proposed using a functional mixture model to profile customer behavior in order to identify and capture attrition patterns. López-Díaz et al. [11] introduced a new stochastic ordering to compare some classifiers used in commercial banking to analyze customer attrition.
Modelling bus-pedestrian crash severity in the state of Victoria, Australia
Published in International Journal of Injury Control and Safety Promotion, 2021
Seyed Alireza Samerei, Kayvan Aghabayk, Nirajan Shiwakoti, Sajjad Karimi
In terms of analytical methods, various regression models have been used to model the occurrence of crashes (Lord & Mannering, 2010) and the severity of road users' injuries (Kaplan & Prato, 2012). Data mining techniques have been introduced and used in recent years to analyze large data sets of traffic crashes. Data mining encompasses several parametric and non-parametric techniques that can be used to analyze large amounts of data and extract hidden patterns (Kumar & Toshniwal, 2016). These techniques belong to the category of supervised Data mining methods and are used to classify the dependent variable. Over the past decade, non-parametric data mining methods, including association rules discovery, have been used to analyze and find patterns in crash data (Amiri et al., 2016; Besharati & Tavakoli Kashani, 2018; Kumar & Toshniwal, 2015; Mirabadi & Sharifian, 2010; Montella, 2011; Nitsche et al., 2017; Pande & Abdel-Aty, 2009; Weng et al., 2016).
Related Knowledge Centers
- Anomaly Detection
- Artificial Intelligence
- Cluster Analysis
- Neural Network
- Statistical Inference
- Information Processing
- Data Collection
- Association Rule Learning
- Sequential Pattern Mining
- Data Dredging