Explore chapters and articles related to this topic
Data Stream Mining for Big Data
Published in Himansu Das, Jitendra Kumar Rout, Suresh Chandra Moharana, Nilanjan Dey, Applied Intelligent Decision Making in Machine Learning, 2020
Data stream mining is a challenging task because of the complex dynamics involved. In this chapter, I have presented some algorithms for some basic problems in streaming data such as filtering and counting. We also discussed how to sample from a data stream efficiently as well as handling concept drift. We can see that sampling and concept drift are some of the fundamental problems affecting other stream mining tasks such as classification, clustering, and novelty detection. Therefore, having a good sampler for streaming data off-the-shelf makes downstream tasks easier without worrying about devising a new algorithm for handling them separately. On the other hand, concept drift is another problem which often comes with class imbalance or novelty detection. DWM and CUSUM are classic algorithms that can be used to track concept drifts efficiently. Higia is a state-of-the-art model for such problems. There are still unsolved problems in big streaming data such as privacy preservation, handling incomplete and delayed information, and analysis of complex data to name a few (Krempl et al., 2014).
Mining Ubiquitous Data Streams for IoT
Published in Ricardo Armentano, Robin Singh Bhadoria, Parag Chatterjee, Ganesh Chandra Deka, The Internet of Things, 2017
Nitesh Funde, Meera Dhabu, Ayesha Khan, Rahul Jichkar
Nowadays, efforts have been taken by researchers on big data analytics in the ubiquitous devices and IoT. The conceptual architecture of UDM process in IoT is shown in Figure 20.1. The ubiquitous devices such as sensor network and smartphones consist of various sensors; inbuilt sensors (accelerometer in case of smartphones) collect the raw data that need to be preprocessed. The data-stream-mining techniques consist of various classifications, clustering, and frequent pattern mining (FPM) techniques; extract the useful information that can be visualized; and appropriate decision is made from this. This knowledge can be shared with other ubiquitous devices in the network.
Streaming Data Classification in Clustered Wireless Sensor Networks
Published in Ibrahiem M. M. El Emary, Anna Brzozowska, Shaping the Future of ICT, 2017
Manal Abdullah, Yassmeen Alghamdi
Data are transmitted from the source to the destination with multiple routes in WSNs. The data follows multiple intermediate nodes. In this case, increased traffic means more energy is to be consumed. Hence, intermediate node failure may occur. Therefore, a reliable system topology that provides multiple paths from the source to the destination if required (Khan, 2014). In some applications for sensor networks, data that WSNs process usually arrive in an online fashion. They are unlimited and there is no control on the arrival order of the elements being processed. Such data are called data streams (de Aquino et al. 2007a, b). As a general rule, there are some differences between sensor streams and traditional streams. The sensor streams are only samples of the entire population, imprecise, noisy, and of a moderate size. Whereas in traditional streams, the entire population is used, and the data is exact, error-free, and huge (de Aquino et al. 2007a). The process of extracting knowledge structures from continuous data streams is called data stream mining. A data stream is an ordered sequence of instances that can only be read a few times using limited computing and storage capabilities (Sabit and Al-Anbuky 2014). WSNs generate massive data streams with the spatial and sensor measurements information, and, as known, the energy of sensors is limited. Therefore, reducing sensors’ energy expenditure and accordingly extending the network lifetime is the major challenge in such networks (Huang and Zhang 2011c). WSNs can benefit a great deal from stream mining algorithms in terms of energy saving. However, to achieve better energy conservation, the data stream mining has to be performed in a distributed manner, due to their resource constraints (Sabit and Al-Anbuky 2014). Furthermore, transmitting all sensor data to a central location over limited bandwidth exhausts a large amount of energy. This also requires performing in-network distributed data processing (Huang and Zhang 2011c). Algorithms must converge the limited datasets as fast as possible, to ensure that the processor can take on the next set of streams (Sabit and Al-Anbuky 2014).
A clustering based variable sub-window approach using particle swarm optimisation for biomedical sensor data monitoring
Published in Enterprise Information Systems, 2021
Kun Lan, Simon Fong, Lian-Sheng Liu, Raymond K. Wong, Nilanjan Dey, Richard C. Millham, Kelvin K.L. Wong
In the recent applications, massive creation and continuous gathering of streaming data motivated the development of a new branch of data mining known as data stream mining. Biosignal from sensors is referred to as one of the most important non-stationary data stream that could be measured, recorded, and analysed as fluctuations of amplitude of bodily vital signs and contains time-evolving distribution of pattern. Mining such a data stream is quite challenging for its dynamic changing behaviours. In the literature, there are many papers report about the needs of new processing workflow and improvised data stream analytics. A novel hybrid method of swarm intelligence search and variable sub-window partitioning that is proposed and shown to capably obtain a scalable and suitable window range of arbitrary pattern that laid along the time axis. It is able to detect concept drift or content change in evolving data as well. By the ability of similarity measure and correlation matching of clustering techniques, the swarm search method can find better solutions more precisely in some appropriately partitioned chunks of the stream, as sub-windows than the conventional methods. In this paper, the efficacy of the new model, known as CVW-PSO is shown from experiments that work over empirical biomedical stream datasets. Promising experiment results are shown which could show higher overall performance. The contribution of our work is intended to serve as an inspiration of succeeding time point in systematic research of heuristic intelligent pre-processing mechanisms for adaptive learning with evolving non-stationary data stream.
Online Learning Model for Handling Different Concept Drifts Using Diverse Ensemble Classifiers on Evolving Data Streams
Published in Cybernetics and Systems, 2019
Data stream mining refers the extraction of knowledge from the continuously arriving data streams. The primary process in data mining is to extract the hidden knowledge or patterns from the stream data. Several methods such as clustering, classification, and regression algorithms have been proposed to collect, analyze, and visualize the information from stream data (Nguyen, Woon, and Ng 2015). Unlike the static dataset, data stream mining algorithms have to face many challenges such as memory, response-time, and concept-drift detection. Different taxonomies of ensemble classification algorithms have been developed to learn from timely changing data streams. Both the techniques have some advantages and limitations in handling the concept drift over continuously arriving stream data.