Explore chapters and articles related to this topic
Data Stream Mining for Big Data
Published in Himansu Das, Jitendra Kumar Rout, Suresh Chandra Moharana, Nilanjan Dey, Applied Intelligent Decision Making in Machine Learning, 2020
There are many ways to sample from a data stream. For example, we can use random uniform sampling which is the simplest algorithm. However, doing random uniform sampling in a data stream is not straightforward since all items in the stream are not available at once. Hence, there is another algorithm called reservoir sampling used to sample items from a data stream such that the probability of selecting an item is 1∕n where n is the stream size.
Data Stream Management for CPS-based Healthcare: A Contemporary Review
Published in IETE Technical Review, 2022
Sadhana Tiwari, Sonali Agarwal
This Reservoir sampling scheme maintains a fixed-size-K random sample in single-pass scanning of the data set. Keep first K items of the data set in a sequential manner in the K size reservoir, when each item of the sequence occurs, accept this data item with the probability K/n, where after the insertion data set size is n. If the item is accepted with probability K/n, the item displays as a randomly selected data item in the reservoir. The time complexity of one-pass reservoir sampling is O(n(1+log(N/n))). This algorithm is used to maintain a fixed-size sample of N data items from a data stream dynamically, hence suitable for data stream model [103,105,114,115].
Complex industrial automation data stream mining algorithm based on random Internet of robotic things
Published in Automatika, 2019
The reservoir sampling method is a simple random sampling method in which each data stream element is extracted to the sample set with the same probability. The core idea is as follows: maintain a sample of size m, called the “reservoir”. By scanning the data stream elements entering the window, the data element is selected into the reservoir with a probability of m/n, and the time-stamped data stream in the reservoir is replaced. The reservoir sampling method assumes that the data stream elements are independently and identically distributed, so there is no need to consider the correlation between the data streams, which makes implementation relatively simple.