Reservoir sampling

Reservoir sampling is a method of finding a dynamic sample from a data stream. It is a simple random sampling method where each data stream element is extracted to the sample set with the same probability. The probability of selecting an item is 1/n where n is the stream size. This method is used to sample items from a data stream and is particularly useful for massive data sets.From: Data Classification [2019], Complex industrial automation data stream mining algorithm based on random Internet of robotic things [2019], Advanced Data Structures [2019], Applied Intelligent Decision Making in Machine Learning [2020]

Data Stream Mining for Big Data

View Chapter

Purchase Book

Published in Himansu Das, Jitendra Kumar Rout, Suresh Chandra Moharana, Nilanjan Dey, Applied Intelligent Decision Making in Machine Learning, 2020

Chandresh Kumar Maurya

There are many ways to sample from a data stream. For example, we can use random uniform sampling which is the simplest algorithm. However, doing random uniform sampling in a data stream is not straightforward since all items in the stream are not available at once. Hence, there is another algorithm called reservoir sampling used to sample items from a data stream such that the probability of selecting an item is 1∕n where n is the stream size.

Data Stream Management for CPS-based Healthcare: A Contemporary Review

View Article

Journal Information

Published in IETE Technical Review, 2022

Sadhana Tiwari, Sonali Agarwal

This Reservoir sampling scheme maintains a fixed-size-K random sample in single-pass scanning of the data set. Keep first K items of the data set in a sequential manner in the K size reservoir, when each item of the sequence occurs, accept this data item with the probability K/n, where after the insertion data set size is n. If the item is accepted with probability K/n, the item displays as a randomly selected data item in the reservoir. The time complexity of one-pass reservoir sampling is O(n(1+log(N/n))). This algorithm is used to maintain a fixed-size sample of N data items from a data stream dynamically, hence suitable for data stream model [103,105,114,115].

Complex industrial automation data stream mining algorithm based on random Internet of robotic things

View Article

Journal Information

Published in Automatika, 2019

Lianhe Cui

The reservoir sampling method is a simple random sampling method in which each data stream element is extracted to the sample set with the same probability. The core idea is as follows: maintain a sample of size m, called the “reservoir”. By scanning the data stream elements entering the window, the data element is selected into the reservoir with a probability of m/n, and the time-stamped data stream in the reservoir is replaced. The reservoir sampling method assumes that the data stream elements are independently and identically distributed, so there is no need to consider the correlation between the data streams, which makes implementation relatively simple.

Reservoir sampling

Explore chapters and articles related to this topic

Data Stream Mining for Big Data

Data Stream Management for CPS-based Healthcare: A Contemporary Review

Complex industrial automation data stream mining algorithm based on random Internet of robotic things