Skip to content

In-place sampling #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
make-github-pseudonymous-again opened this issue Mar 3, 2017 · 1 comment
Open

In-place sampling #8

make-github-pseudonymous-again opened this issue Mar 3, 2017 · 1 comment
Assignees
Labels

Comments

@make-github-pseudonymous-again
Copy link
Member

make-github-pseudonymous-again commented Mar 3, 2017

If you want to sample k items out of n, Fisher Yates shuffles your item array.
Hereunder is an efficient no shuffle implementation. Something better can probably be achieved with prefix sums and rank/select.

Theorem

We can sample k items out of n with k random draws and O(k log k) time, without modifying the items array.

Proof

TODO fix proof

Given n items, at draw i=1,2,...,n with a binary search tree containing all previously drawn items indices between 0 and n-1, pick a random integer x between 0 and n - i. Binary search for its corresponding index value as follows: start with p = 0 the number of predecessors of the current node, let l be the size of the left subtree of the current node, let v be the value of the current node, if x + p + l + 1 < v
the current node becomes the left child, otherwise the current node becomes the right child and p = p + l + 1. If there is no such child, insert x + p in the case of a left child, x + p + l + 1 in the case of a right child.

If x + p + l + 1 < v, then x cannot be in the right subtree of v. The item index corresponding to x would be x + p + l + 1 if it was the right child of v which is smaller than v. Pick any node c in the right subtree of v. Let g be the number of predecessors of c in the right subtree of v. The value of c is at least v + 1 + g because all indices are distinct. Suppose c has no right child. The item index of x as a right child of c is x + p + l + 1 + g + 1 < v + g + 1 <= c. Suppose c has no left child. The item index of x as a left child of c is x + p + l + 1 + g < v + g < v + g + 1 <= c.

If x + p + l + 1 > v, then x cannot be in the left subtree of v. If v has no left child, then l = 0 and the value of x as the left child of v would be x + p which is larger or equal to v (indices must be distinct). Pick any node c in the left subtree of v.
Let s be the number of successors of s in the left subtree of v. The value of c is at most v - s - 1 because all indices are distinct. Suppose c has no left child. The item index of x as a left child of c is x + p + l - s - 1 > v - s - 2 so x + p + l - s - 1 >= v - s - 1 = c.

@make-github-pseudonymous-again
Copy link
Member Author

Related #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant