Data stream – Knowledge and References

Explore chapters and articles related to this topic

Machine Learning

Published in Pedro Larrañaga, David Atienza, Javier Diaz-Rozo, Alberto Ogbechie, Carlos Puerto-Santana, Concha Bielza, Industrial Applications of Machine Learning, 2019

Pedro Larrañaga, David Atienza, Javier Diaz-Rozo, Alberto Ogbechie, Carlos Puerto-Santana, Concha Bielza

Data streams have intrinsic characteristics, such as possibly infinite volume, chronological order, and dynamical changes. Machine learning methods for static datasets are able to scan datasets many times with unlimited time and memory resources and produce fairly accurate results. On the other hand, machine learning methods for data streams may produce approximate results and have to satisfy constraints, such as single-pass, real-time response, bounded memory, and concept drift detection (Nguyen et al., 2015). Each instance in a data stream is examined at most once and cannot be backtracked. This single-pass constraint can be relaxed slightly to allow an algorithm to remember instances in the short term. Many data stream applications require areal-time data processing and decision-making response. The bounded memory constraint relates to the fact that only a small summary of the data streams can be stored and computed, and the rest of the data possibly have to be removed. Finally, concept drift refers to the situation when the underlying data distribution changes over time invalidating the current model which has to be updated.

End-to-End Security Framework for Big Sensing Data Streams

View Chapter

Purchase Book

Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017

Deepak Puthal, Surya Nepal, Rajiv Ranjan, Jinjun Chen

Data stream processing is an emerging computing paradigm that is particularly suitable for application scenarios where huge amounts of data (termed as big data) must be processed in near real time (with minimal delay). Unlike traditional batch-processing systems where query processing is done over archived (i.e., the data need to be stored based on a predefined schema prior to processing) data, SPE processes real-time time streaming data on-the-fly. The need for on-the-fly processing arises from the high-volume and high-velocity input data that cannot be persisted for later analysis for practical reasons (e.g., data storage overhead). DSM handles streams of tuples in a similar way to a conventional database system handling relations. In addition, DSM undertakes the security verification of the data blocks on-the-fly.

Linear Programming (LP)-Based Two-Phase Classifier for Solving a Classification Problem with Multiple Objectives

View Chapter

Purchase Book

Published in Ramakrishnan Ramanathan, Muthu Mathirajan, A. Ravi Ravindran, Big Data Analytics Using Multiple Criteria Decision-Making Models, 2017

Sakthivel Madankumar, Pusapati Navya, Chandrasekharan Rajendran, N. Srinivasa Gupta, B. Valarmathi

In order to handle the large volume of data, batch processing technologies are used (e.g., Apache Hadoop) and to handle the high velocity of data, stream processing technologies are used (e.g., Apache Spark or Apache Storm). These technologies help in processing the big data to derive its meaning and also to convert the unstructured data from various sources into structured data. Once we have the structured data, we can apply the proposed LP-based classifiers to derive the hidden pattern/decisions in this large volume of data such that we can identify the non-dominated set of solutions with respect to multiple objectives.

Data Stream Management for CPS-based Healthcare: A Contemporary Review

View Article

Journal Information

Published in IETE Technical Review, 2022

Sadhana Tiwari, Sonali Agarwal

Data stream management (DSM) for cyber-physical systems (CPSs)-based healthcare is in focus these days and considered as an accelerating field of research. To develop an intelligent and smart healthcare system, DSM using CPS is very important [1–3]. Although several research had been carried out in this area, managing uncertainties in the data stream is still a challenging problem in healthcare [4–6]. Data streams can be defined as continuous and rapid growing sequences of data with respect to time or real-time sequences of data. DSMs are responsible for managing continuous queries over data stream in real time. Major applications of DSM include healthcare, network traffic and fault management. The principal role of DSM is to give quality of service (QoS) and few QoS supported by DSM include load shedding, capacity planning and scheduling.

Towards logistics 4.0: an edge-cloud software framework for big data analytics in logistics processes

View Article

Journal Information

Published in International Journal of Production Research, 2021

Moritz von Stietencron, Karl Hribernik, Katerina Lepenioti, Alexandros Bousdekis, Marco Lewandowski, Dimitris Apostolou, Gregoris Mentzas

IoT data calls for another new class of analytics, namely fast and streaming data analytics, to support applications with high-speed data streams and requiring time-sensitive actions (Kejariwal, Kulkarni, and Ramasamy 2015). Streaming analytics is the practice of applying analytics to a high throughput of data streams from multiple, disparate live data sources and in any data format to identify patterns, discover new insights, predict future events or make decisions (Mohammadi et al. 2018). Stream-oriented frameworks typically provide time-sensitive computations but have relatively high data processing costs on a continuous stream of IoT data. A pertinent issue in streaming analytics is to balance the trade-off between throughput and latency with data parallelism and incremental processing (Mohammadi et al. 2018). Stream-oriented processing architectures usually avoid putting data at rest. However, this is a non-trivial task, since the nature of IoT data streams changes over time due to the geographical location of IoT devices (De Francisci Morales et al. 2016). Moreover, streaming analytical algorithms must work within limited resources (Cao and Wachowicz 2019). Streaming analytics of big data have had a significant impact on logistics (Borgi, Zoghlami, and Abed 2017; Hopkins and Hawking 2018).

A clustering based variable sub-window approach using particle swarm optimisation for biomedical sensor data monitoring

View Article

Journal Information

Published in Enterprise Information Systems, 2021

Kun Lan, Simon Fong, Lian-Sheng Liu, Raymond K. Wong, Nilanjan Dey, Richard C. Millham, Kelvin K.L. Wong

In the recent applications, massive creation and continuous gathering of streaming data motivated the development of a new branch of data mining known as data stream mining. Biosignal from sensors is referred to as one of the most important non-stationary data stream that could be measured, recorded, and analysed as fluctuations of amplitude of bodily vital signs and contains time-evolving distribution of pattern. Mining such a data stream is quite challenging for its dynamic changing behaviours. In the literature, there are many papers report about the needs of new processing workflow and improvised data stream analytics. A novel hybrid method of swarm intelligence search and variable sub-window partitioning that is proposed and shown to capably obtain a scalable and suitable window range of arbitrary pattern that laid along the time axis. It is able to detect concept drift or content change in evolving data as well. By the ability of similarity measure and correlation matching of clustering techniques, the swarm search method can find better solutions more precisely in some appropriately partitioned chunks of the stream, as sub-windows than the conventional methods. In this paper, the efficacy of the new model, known as CVW-PSO is shown from experiments that work over empirical biomedical stream datasets. Promising experiment results are shown which could show higher overall performance. The contribution of our work is intended to serve as an inspiration of succeeding time point in systematic research of heuristic intelligent pre-processing mechanisms for adaptive learning with evolving non-stationary data stream.