Concept drift – Knowledge and References

Explore chapters and articles related to this topic

Big Data and IoT Forensics

Published in Vijayalakshmi Saravanan, Alagan Anpalagan, T. Poongodi, Firoz Khan, Securing IoT and Big Data, 2020

The forensic investigations consist of big data coming from different IoT devices. Data mining techniques are critical to find potential and interesting examples and to extricate uncovered value from enormous and streaming data sets. Nonetheless, conventional data mining approaches such as descriptive and predictive analytics, suffer from the problems of scalability, efficiency, and precision when utilized for huge and dynamic data sets. On account of the dimensionality, pace, and fluctuation of streams, it is not achievable to store them forever and then to analyse them. Along these lines, researchers aspire to discover better approaches to optimize existing solutions to produce accurate results by processing data samples in a timely manner with limited memory of IoT devices. Besides the time-variant nature of stream data, it also deals with the concept drift problem. A change in the distribution of data over time is known as concept drift. Various experiments on data streams have shown degraded performance of classification models. Subsequently, a few data mining approaches like classification and clustering were modified to incorporate drift identification methods in a dynamic environment. Trials on data streams showed that a change in the fundamental idea influences the presentation of classifier model. Along these lines, improved algorithms are designed to identify and adjust to the idea drifts [40, 41].

Machine Learning

View Chapter

Purchase Book

Published in Pedro Larrañaga, David Atienza, Javier Diaz-Rozo, Alberto Ogbechie, Carlos Puerto-Santana, Concha Bielza, Industrial Applications of Machine Learning, 2019

Pedro Larrañaga, David Atienza, Javier Diaz-Rozo, Alberto Ogbechie, Carlos Puerto-Santana, Concha Bielza

Data streams have intrinsic characteristics, such as possibly infinite volume, chronological order, and dynamical changes. Machine learning methods for static datasets are able to scan datasets many times with unlimited time and memory resources and produce fairly accurate results. On the other hand, machine learning methods for data streams may produce approximate results and have to satisfy constraints, such as single-pass, real-time response, bounded memory, and concept drift detection (Nguyen et al., 2015). Each instance in a data stream is examined at most once and cannot be backtracked. This single-pass constraint can be relaxed slightly to allow an algorithm to remember instances in the short term. Many data stream applications require areal-time data processing and decision-making response. The bounded memory constraint relates to the fact that only a small summary of the data streams can be stored and computed, and the rest of the data possibly have to be removed. Finally, concept drift refers to the situation when the underlying data distribution changes over time invalidating the current model which has to be updated.

Conclusion

View Chapter

Purchase Book

Published in Arun Reddy Nelakurthi, Jingrui He, Social Media Analytics for User Behavior Modeling, 2020

Arun Reddy Nelakurthi, Jingrui He

As discussed in the introduction chapter earlier (Chapter 1), social media data is dynamic in nature. Content discussed on the various healthcare-specific social media forums changes from time to time, and the topics of discussion evolve over time which is called as concept drift [Sun et al., 2018]. Concept drift is a common phenomenon in medical informatics, financial data analysis, and social networks. In the existing research, incremental learning, which updates learning machines (models) when a chunk of new training data arrives, is a major learning paradigm for tackling such tasks. In particular, the learning machines should be updated without access to previous data, such that there is no need to store or relearn the model using the previous data. Most research on addressing concept drift can be divided into three categories: (1) using a sliding window technique to train the models and give importance to recent data, (2) modeling for concept drift by considering data chunks at various time intervals, and (3) creating an ensemble of models from consecutive time stamps and build a predictive model as a function of ensemble models. Ensemble models are shown to be better at handling concept drift [Xie et al., 2017]. Our work in adapting off-the-shelf classifiers does not directly address the temporal dynamics involved in social media data. Assuming a number of data chunks D1, …, Dt, with t sequential time steps, existing work on ensemble-based methods is useful in learning t sub-tasks, each of which can be regarded as an adapted model (base learner). These base learners at various time stamps can be further leveraged to address concept drift.

Realising the promises of artificial intelligence in manufacturing by enhancing CRISP-DM

View Article

Journal Information

Published in Production Planning & Control, 2023

Jon Bokrantz, Mukund Subramaniyan, Anders Skoogh

AI drift consists of two forms: data drift and concept drift. Data drift occurs when training data used in the modelling phase no longer adequately represent the states of reality, which can occur for two broad reasons: (1) stochastic changes in production system behaviour (e.g. machine degradation or dynamic changes in machine interactions) that alter the stationarity and distribution of the data, or (2) data corruption (e.g. erroneous sensor output or errors incurred through data pipeline transfer) that introduces faulty data points. Concept drift occurs when the underlying dynamics of the production process are changed, e.g. introducing new products, installing new machines, or altering the production flow. Over time, AI drift may lead to that AI performance deteriorates, sometimes even to the point that the AI solution no longer holds adequate predictive power for interpreting unfamiliar data.

Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors

View Article

Journal Information

Published in Technometrics, 2023

Kungang Zhang, Anh T. Bui, Daniel W. Apley

In general, the methods in the concept drift literature can be categorized into two classes (Tsymbal 2004; Harel et al. 2014): (a) model adaptation (or online learning) methods and (b) concept drift detection methods. Model adaptation methods mainly focus on maintaining the performance of machine learning models in the presence of concept drift, without formally detecting or diagnosing the drift (Wang et al. 2003; Tsymbal et al. 2008; Gonçalves Jr et al. 2014; Barros and Santos 2018). To maintain a good prediction metric related to classification or regression error, models are automatically updated (i.e., adapted) online continuously as new observations are collected, which is sometimes called online or incremental learning. This class of methods is not particularly relevant to our work, since our goal is concept drift detection and diagnosis, and not model adaptation. In fact, we view our approach as something that can be used in conjunction with model adaptation methods to make them more efficient and interpretable. In particular, the model adaptation could be turned on only when the concept drift detection component has indicated that has changed.

Frontiers of medical decision-making in the modern age of data analytics

View Article

Journal Information

Published in IISE Transactions, 2023

Brian T. Denton

Concept drift refers to the impact of changes that occur over time that can impact the accuracy of models due to dataset shift over time (Finlayson et al., 2021). As an example, consider a predictive model that provides estimates of the risk that a patient newly admitted to a hospital with COVID 19 will require a ventilator. A model created using data near the start of the pandemic – spring 2020 – loses accuracy over time due to an epidemiological effect known as harvesting, which refers to the fact that the most at-risk patients succumb to a novel disease like COVID 19 early in a pandemic. As a result, the probability of a patient requiring a ventilator will eventually be overestimated unless the model is recalibrated over time (for an example of work in this area, see Otles et al. (2021). This issue raises many questions: What performance measures are most appropriate for assessing model accuracy? How frequently should models be evaluated? What is the optimal approach for model updating over time? Some of these questions cross boundaries between quality control engineering and stochastic dynamic programming with the synergistic goals of detecting and correcting for performance degradation in optimal policies over time.