Skip to content

Feature: Weighted reservoir sampling without replacement

This MR implements weighted reservoir sampling without replacement.

One can find the algorithm description (no code) in the original paper, and pseudocode on the wikipedia page for reservoir sampling. I attach the description of both algorithms given in the paper (the second of the two is implemented):

Algorithm A-Res Algorithm A-Res ExpJ

I note that I also change the key function to k_i = -log(u_i) / w_i, where w_i is the weight for index i and u_i is a random number drawn for index i, rather than k_i = u_i^{1/w_i} (as specified in the paper), as this prevents k_i blowing up for small weights. This required some small changes to the algorithm, which I think I've got correct - I could not find an explicit reference for this.

Specifically, I replace finding the lowest key with the largest key, and consequently I draw u_i2 from [a, thres_w] as the upper bound should get lower as the algorithm progresses:

! Before
u_i2 = random_number_real64(thres_w + a, b, seed(1))

! After change to log key function
u_i2 = random_number_real64(a, thres_w, seed(1))`

I did not change the definition of Xw.

One can find prototype testing on my Github repo (scroll to the bottom cells).

EDIT 11/9/2024

As I do not believe my choice of jump function Xw is consistent with modified key, k_i = -log(u_i) / w_i, I've switched to the original implementation described in the original paper. Consequently, the weights in the unit tests have been updated to reflect that one should not use small weights << 1.

Edited by Alex Buccheri

Merge request reports

Loading