You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you want to sample k items out of n, Fisher Yates shuffles your item array.
Hereunder is an efficient no shuffle implementation. Something better can probably be achieved with prefix sums and rank/select.
Theorem
We can sample k items out of n with k random draws and O(k log k) time, without modifying the items array.
Proof
TODO fix proof
Given n items, at draw i=1,2,...,n with a binary search tree containing all previously drawn items indices between 0 and n-1, pick a random integer x between 0 and n - i. Binary search for its corresponding index value as follows: start with p = 0 the number of predecessors of the current node, let l be the size of the left subtree of the current node, let v be the value of the current node, if x + p + l + 1 < v
the current node becomes the left child, otherwise the current node becomes the right child and p = p + l + 1. If there is no such child, insert x + p in the case of a left child, x + p + l + 1 in the case of a right child.
If x + p + l + 1 < v, then x cannot be in the right subtree of v. The item index corresponding to x would be x + p + l + 1 if it was the right child of v which is smaller than v. Pick any node c in the right subtree of v. Let g be the number of predecessors of c in the right subtree of v. The value of c is at least v + 1 + g because all indices are distinct. Suppose c has no right child. The item index of x as a right child of c is x + p + l + 1 + g + 1 < v + g + 1 <= c. Suppose c has no left child. The item index of x as a left child of c is x + p + l + 1 + g < v + g < v + g + 1 <= c.
If x + p + l + 1 > v, then x cannot be in the left subtree of v. If v has no left child, then l = 0 and the value of x as the left child of v would be x + p which is larger or equal to v (indices must be distinct). Pick any node c in the left subtree of v.
Let s be the number of successors of s in the left subtree of v. The value of c is at most v - s - 1 because all indices are distinct. Suppose c has no left child. The item index of x as a left child of c is x + p + l - s - 1 > v - s - 2 so x + p + l - s - 1 >= v - s - 1 = c.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
If you want to sample k items out of n, Fisher Yates shuffles your item array.
Hereunder is an efficient no shuffle implementation. Something better can probably be achieved with prefix sums and rank/select.
Theorem
We can sample k items out of n with k random draws and O(k log k) time, without modifying the items array.
Proof
TODO fix proof
Given n items, at draw i=1,2,...,n with a binary search tree containing all previously drawn items indices between 0 and n-1, pick a random integer x between 0 and n - i. Binary search for its corresponding index value as follows: start with p = 0 the number of predecessors of the current node, let l be the size of the left subtree of the current node, let v be the value of the current node, if x + p + l + 1 < v
the current node becomes the left child, otherwise the current node becomes the right child and p = p + l + 1. If there is no such child, insert x + p in the case of a left child, x + p + l + 1 in the case of a right child.
If x + p + l + 1 < v, then x cannot be in the right subtree of v. The item index corresponding to x would be x + p + l + 1 if it was the right child of v which is smaller than v. Pick any node c in the right subtree of v. Let g be the number of predecessors of c in the right subtree of v. The value of c is at least v + 1 + g because all indices are distinct. Suppose c has no right child. The item index of x as a right child of c is x + p + l + 1 + g + 1 < v + g + 1 <= c. Suppose c has no left child. The item index of x as a left child of c is x + p + l + 1 + g < v + g < v + g + 1 <= c.
If x + p + l + 1 > v, then x cannot be in the left subtree of v. If v has no left child, then l = 0 and the value of x as the left child of v would be x + p which is larger or equal to v (indices must be distinct). Pick any node c in the left subtree of v.
Let s be the number of successors of s in the left subtree of v. The value of c is at most v - s - 1 because all indices are distinct. Suppose c has no left child. The item index of x as a left child of c is x + p + l - s - 1 > v - s - 2 so x + p + l - s - 1 >= v - s - 1 = c.
The text was updated successfully, but these errors were encountered: