1. Background
Sampling a target distribution from a random physical source has many applications. However, the random physical sources are often biased with unknown distribution, while we need a specific target distribution in applications. Therefore, an efficient algorithm to generate a target distribution from a random source is of great value. A simple method to generate a fair binary distribution from an unfair binary source with an unknown bias was first proposed in [Reference Von Neumann7], and this method has served as a precursor of a series of algorithms to generate a target distribution from an unknown random source.
Von Neumann’s method was improved in [Reference Hoeffding and Simons3, Reference Stout and Warren6] to generate a fair binary distribution from a biased random source. From the point of view of probability theory, [Reference Elias2] formally defined the kind of random procedure that can generate a target distribution. Elias also designed an infinite sequence of sampling schemes, with computational efficiency decreasing to the theoretical lower bound, though without providing an executable algorithm for the method. Elias’ method needs to generate Elias’ function first; such a preprocessing step needs an exponential space cost and at least a polynomial time cost [Reference Pae5], and thus Elias’ method is computationally costly and inefficient.
Another method, for generating a uniform distribution on a set of p elements for p prime, was provided in [Reference Dijkstra1], though Dijkstra’s method is computationally inefficient. Indeed, in realizing this method we need a preprocessing step to generate and store a function which maps outcomes from the random source to some target values. However, such a preprocessing step needs an exponential time and space cost.
In this article we propose a new algorithm based on the idea of Dijkstra’s method. The proposed algorithm does not need a preprocessing step, and is thus computationally efficient. The article is organized as follows: In Section 2, we briefly recast Von Neumann’s method as a starting point, as well as a special case of our algorithm. In Section 3, we heuristically construct and explain our algorithm. In Sections 4 and 5 we formally propose our algorithms and verify them theoretically. In Section 5, we prove that our algorithm has overall sublinear average running time. Another novel proof of Theorem 1 is given in Appendix A.
2. Introduction to Von Neumann’s method
Let $X\in \{ H, T \}$ denote the outcome of a biased coin flip with probability $a=\mathbb{P}(X=H)\in (0,1)$ of getting a head and probability $b=\mathbb{P}(X=T)=1a$ of getting a tail. Let $\{X_i\;:\; i\ge 0\}$ be independent and identically distributed (i.i.d.) copies of X. Von Neumann proposed an algorithm $\mathcal{A}_1$ to generate a fair binary random variable with distribution $\mathbb{P}(\mathcal{A}_1=0)=\mathbb{P}(\mathcal{A}_1=1)=\frac12$ as shown in Algorithm 1 [Reference Von Neumann7].
Let $\{ {\boldsymbol{{Y}}}_i = (X_{2i}, X_{2i+1})\;:\; i \ge 0 \}$ be i.i.d. outcomes of pairs of flips, and $\tau$ be the first time that ${\boldsymbol{{Y}}}_i \in \{ HT, TH \}$ ; then
This derivation shows that $\mathcal{A}_1$ generates a fair binary distribution. Below, we propose an efficient algorithm to generate a uniform distribution on p elements for p prime. At each cycle we flip a coin p times; the algorithm returns a number in $\{ 0, \ldots, p1 \}$ except when the p flips are all heads or all tails, analogous to Von Neumann’s method.
3. Heuristic explanation for the main idea
Let the random vector ${\boldsymbol{{X}}}^{n}= ( X_0, \ldots, X_{n1} )\in \{H, T \}^n$ be the outcome of n flips. Let $N_{\text{head}}({\boldsymbol{{X}}}^{n})$ denote the head count in ${\boldsymbol{{X}}}^{n}$ , and $S_{\text{head}}({\boldsymbol{{X}}}^{n})$ denote the rank sum of heads in ${\boldsymbol{{X}}}^{n}$ , with rank beginning from 0:
For example, when ${\boldsymbol{{X}}}^{5}=(H,H,T,H,T)$ , we have $N_{\text{head}}({\boldsymbol{{X}}}^{5})=3$ and $S_{\text{head}}({\boldsymbol{{X}}}^{5})=4.$
For a specific sequence of n flips ${\boldsymbol{{x}}}^n=(x_0,\ldots, x_{n1})\in \{H, T \}^n$ as an observation of ${\boldsymbol{{X}}}^{n}$ , if $N_{\text{head}}({\boldsymbol{{x}}}^n)=\sum_{i=0}^{n1}\mathbf{1}_{\{x_i=H\}}=k$ , then the probability of getting ${\boldsymbol{{x}}}^n$ in n flips is $\mathbb{P}({\boldsymbol{{X}}}^{n} = {\boldsymbol{{x}}}^n) = \prod_{i=0}^{n1}\mathbb{P}(X_i = x_i)=a^k b^{nk}$ , which only depends on the head count k. As a result, for $0\le k\le n$ , there are exactly $\binom{n}{k}$ outcomes of n flips containing k heads, each with the same probability $a^k b^{nk}$ . Let
where $A$ means the cardinality of set A. Thus, $S_k$ is the set of all subsets of $\{ 0, \ldots, n\!\!1 \}$ containing k elements. Note that $S_k=\binom{n}{k}$ , and each element in $S_k$ corresponds to one and only one outcome of n flips with k heads in the following way:
where each $i_t$ corresponds to the rank of an appearance of a head in the $i_t$ th flip of n, $i_1 < i_2 <\cdots <i_k$ . As a result, we have the onetoone correspondence
and we also have $\mathbb{P}({\boldsymbol{{X}}}^{n} = {\boldsymbol{{x}}}^n)=a^{k}b^{nk}$ for all ${\boldsymbol{{x}}}^n \in S_k$ . Note that in the correspondences (3) and (4) we do not distinguish the left and righthand sides in the derivation below. And the equivalences are frequently used in the following proof.
Inspired by Von Neumann’s algorithm, we consider an algorithm generating a distribution on the set $\{ 0, \ldots, n\!\!1 \}$ . At each cycle we flip the coin n times, then the algorithm returns a number in $\{ 0, \ldots, n\!\!1 \}$ except when the outcome is all heads or all tails. Define the sets $\{ A_m \;:\; 0\le m\le n\!\!1\}$ to be a disjoint partition of $\bigsqcup_{1\le k\le n\!\!1}S_k$ ,
where $\bigsqcup$ means disjoint union. The algorithm, $\mathcal{A}$ , is formally stated in Algorithm 2.
Let $\{{\boldsymbol{{Y}}}_i = (X_{in}, \ldots, X_{in+n\!\!1})\;:\; i\ge 0\}$ be i.i.d. outcomes of n flips, and $\tau$ be the first time ${\boldsymbol{{Y}}}_i$ is neither all heads nor all tails. Then, for $0 \le m \le n\!\!1$ , we have
Let us consider a special case of algorithm $\mathcal{A}$ where n is a prime, p. The reason for focusing on prime p comes from the fact in number theory that $p \mid \binom{p}{k} = S_k$ for all $1\le k\le p1$ , where the symbol $$ means ‘divides’. Then, for each k, we can partition $S_k$ into disjoint p parts of equal size. For $1\le k \le p1$ , assume that the choice of sets $\{ A_m\;:\; 0 \le m \le p1 \}$ satisfies
where the disjoint $\{ A_m \cap S_k\;:\; 0\le m\le p1 \}$ partition $S_k$ into p subsets of equal size. Based on (5) and (6), for $0 \le m\le p1$ we have
which means algorithm $\mathcal{A}$ returns a uniform distribution on $\{ 0, \ldots, p1 \}$ .
What remains is to find $\{ A_m\;:\; 0\le m\le p1 \}$ satisfying (6). We can always first partition $S_k$ into p subsets of equal size, and then define $\{ A_m \cap S_k\;:\; 0\le m\le p1 \}$ to be these subsets, like the proposed method in [Reference Dijkstra1]. However, this method has two disadvantages. First, there are many ways of partitioning $S_k$ into subsets of equal size, and there is no widely accepted standard. Second, partitioning $\{S_k\;:\; 1\le k\le p1\}$ and designing $\{ A_m\;:\; 0\le m\le p1 \}$ need excessive time and storage cost, because there are $2^p$ different outcomes of p flips we need to handle, which grows exponentially as p increases. A preprocessing step of exponential time is unacceptable for an efficient algorithm.
With the help of the modulo p function, there is an ingenious way of designing $\{ A_m\;:\; 0\le m\le p1 \}$ to satisfy (6). Based on the correspondence in (3), for $0\le m\le p1$ we can indeed choose
as we show in the next section.
4. Generating a uniform distribution on p (prime) elements
We provide an algorithm, $\mathcal{A}_2(p)$ , that generates a discrete uniform distribution on the set $\{0, \ldots, p1 \},$ where p is a prime, as listed in Algorithm 3.
We need the following lemma before proving the main theory.
Lemma 1. Let p be a prime number, and let $\{ S_k\;:\; 1\le k\le p1 \}$ consist of all subsets of $\{ 0,\ldots, p1 \}$ having k elements. For fixed k, let $\{ S_k^m \;:\; 0\le m\le p1 \}$ be defined by
Note that $S^m_k = A_m \cap S_k$ , where $A_m$ is defined in (7).
Then we have
Proof. For fixed $1\le k\le p1$ , consider a permutation on $S_k$ defined by
Denote by $f^0$ the identity function $\operatorname{id}$ . Let $\langle f\rangle$ be the subgroup generated by f. We need to show that $\langle f\rangle = \{f^0=\operatorname{id}, f^1,\ldots, f^{p1}\}$ . Since we know that $f^{p}=\operatorname{id}$ , we need to show $f^s\neq \operatorname{id}$ for $1\le s\le p1$ .
If $f^s= \operatorname{id}$ for some $1\le s\le p1$ , then we have
from which we have $\sum_{j=1}^k(i_j+s) = \sum_{j=1}^k i_j \,(\textrm{mod}\,p)$ . The equality above shows $ p\mid ks$ , which implies $p\mid k$ or $p\mid s$ , leading to a contradiction since $1\le k,s \le p1$ .
Let the group $\langle f\rangle$ act on $S_k$ . For $\{ i_1,\ldots,i_k \}\in S_k$ , let $O_{\{ i_1,\ldots,i_k \}}$ denote the orbit of $\{ i_1,\ldots,i_k \}$ under group action,
The theory of group action tells us that $S_k$ can be divided to disjoint orbits with equal size p. In addition, for any $\{ i_1,\ldots,i_k \}\in S_k$ , when s varies from 0 to $p1$ , $\sum_{j=1}^k i^s_j \,(\textrm{mod}\,p)$ takes all values in $\{0,\ldots, p1\}$ .
If the claim above were not true, then there would exist $0\le s_1<s_2 \le p1$ such that
The equality above shows that $p\mid k(s_2s_1)$ , which implies $p\mid k$ or $p\mid(s_2s_1)$ , leading to a contradiction since $1\le k,s_2s_1\le p1$ .
The argument above shows that $S_k$ is a union of disjoint orbits of equal size p. And in each orbit, for $0\le m\le p1$ , there exists one and only one element belonging to $S^m_k$ , which means the $\{ S_k^m \;:\; 0\le m\le p1 \}$ partition $S_k$ into p subsets with equal size, and the proof is complete.
Figure 1 presents a special case to show the idea of the proof, with $p=7$ and $k=3$ ; the proof proceeds as shown.
Next, we prove the main theorem on algorithm $\mathcal{A}_2(p)$ .
Theorem 1. Let X denote a biased coin with probability $a \in (0, 1)$ of getting a head, and probability $b = 1  a$ of getting a tail. For a prime p, $\mathcal{A}_2(p)$ has the following properties:

(i) $\mathcal{A}_2(p)$ terminates in a finite number of flips with probability 1. The algorithm returns a uniform distribution on $\{0,\ldots,p1\}\;:\;$ $\mathbb{P}(\mathcal{A}_2(p)=m) = {1}/{p}$ for all $0\le m\le p1$ .

(ii) The expected number of flips terminating $\mathcal{A}_2(p)$ is ${p}/({1a^{p}b^{p}})$ , which means that when p is large, the average running time approximates to the linear $\operatorname{O}\!(p)$ .

(iii) By letting $p=2$ , $\mathcal{A}_2(2)$ is exactly Von Neumann’s algorithm $\mathcal{A}_1$ .
Proof. Let ${\boldsymbol{{X}}}^{p}=(X_0, \ldots, X_{p1})$ be the outcome of p flips of a biased coin, a random variable taking values in $\{H,T\}^{p}$ . Based on the correspondences in (3) and (4), and the definition of $S^m_k$ in (8), each ${\boldsymbol{{x}}}^p \in \{H,T\}^{p}$ corresponds to one and only one element in $S^m_k$ by $N_{\text{head}}({\boldsymbol{{x}}}^p)=k$ and $S_{\text{head}}({\boldsymbol{{x}}}^p)=m\,(\textrm{mod}\,p)$ for some k and m, where $N_{\text{head}}({\boldsymbol{{x}}}^p)$ and $S_{\text{head}}({\boldsymbol{{x}}}^p)$ in (1) are the count and rank sum of heads respectively. Recall the definition of $S_k$ in (2); then, by Lemma 1, the $\{ S_k^m \;:\; 0\le m\le p1 \}$ partition $S_k$ into p subsets with equal size.
Let $\{{\boldsymbol{{Y}}}_i = (X_{ip}, \ldots, X_{ip+p1})\;:\; i\ge 0\}$ be i.i.d. outcomes of p flips, and $\tau$ be the first time ${\boldsymbol{{Y}}}_i$ is neither all heads nor all tails. Then, for $0 \le m \le p1$ , we have
where the last identity is implied by the fact that $S^m_k = \frac{1}{p}\binom{p}{k}=\frac{1}{p}S_k$ .
Let E denote the expected number of flips terminating $\mathcal{A}_2(p)$ . Hence, E satisfies
from which we have
We have also come up with a creative and short proof for Theorem 1(i) using random variables in the residue class $\mathbb{Z}_{p}$ ; see Appendix A.
5. Generating a uniform distribution on n elements
Let n be any positive integer with prime factorization $n=\prod_{i=1}^s p_i^{t_i}$ . Let $\mathcal{M}$ be the set of all prime factors of n considering multiplicity, which means $p_i$ appears $t_i$ times in $\mathcal{M}$ . The algorithm $\mathcal{A}_3(n)$ shown in Algorithm 4 generates a discrete uniform distribution on the set $\{0,\ldots, n\!\!1\}$ in an iterative way.
The following theorem shows the validity of algorithm $\mathcal{A}_3(n)$ .
Theorem 2. For any integer n, $\mathcal{A}_3(n)$ has the following properties:

(i) $\mathcal{A}_3(n)$ terminates in a finite number of flips with probability 1. It returns a uniform distribution on $\{0, \ldots, n\!\!1\}\;:\;$ $\mathbb{P}(\mathcal{A}_3(n)=m)={1}/{n}$ for all $0\le m\le n\!\!1$ .

(ii) When n has prime factorization $\prod_{i=1}^s p_i^{t_i}$ , the expected number of flips terminating $\mathcal{A}_3(n)$ is
\begin{equation*} \sum_{i=1}^s \frac{t_i p_i}{1a^{p_i}  b^{p_i}}. \end{equation*}Therefore, the average running time is approximately $\sum_{i=1}^s t_i p_i$ for large n. 
(iii) The overall order of the average running time is $\operatorname{O}\!(n/\log\!(n))$ .
Proof. To show (i), note that each outcome of $\mathcal{A}_3(n)$ corresponds to one and only one sequence of outcomes of $\mathcal{A}_2(p_i)$ . For this fact, first we consider a simplified case where $n=p_1p_2$ is a product of two prime numbers $p_1$ and $p_2$ , and $p_1$ may equal $p_2$ .
Given $n=p_1 p_2$ , then $\mathcal{M} = \{p_1, p_2 \}$ . Suppose we first get $p_1$ from $\mathcal{M}$ and then $p_2$ . Then the outcomes $\mathcal{A}_2(p_1)=m_1$ and $\mathcal{A}_2(p_2)=m_2$ correspond to the outcome $\mathcal{A}_3(n)=m_1 p_2 + m_2$ . Since $0\le m_1\le p_11$ and $0\le m_2\le p_21$ , we have the range for $\mathcal{A}_3(n)$ :
which shows that $\mathcal{A}_3(n)\in \{0, \ldots, n\!\!1\}$ . Note that, for $0\le m\le n\!\!1$ , there exists one and only one pair of $(m_1, m_2)$ as
satisfying the equation $m = m_1p_2+m_2$ $(0\le m_1\le p_11, \, 0\le m_2\le p_21)$ . So, the outcome $\mathcal{A}_3(n) = m$ corresponds to the outcomes $\mathcal{A}_2(p_1)=m_1$ and $\mathcal{A}_2(p_2)=m_2$ .
For the general case $n=\prod_{i=1}^s p_i^{t_i}$ , based on the same method as above, we conclude that, for each m, there exists a unique set $\{ m_{p^\prime}\;:\; p^\prime\in \mathcal{M}\}$ such that the outcome $\mathcal{A}_3(n) = m$ corresponds to the outcomes $\mathcal{A}_2(p^\prime)=m_{p^\prime}$ $(p^\prime \in \mathcal{M})$ . Therefore, the probability of $\mathcal{A}_3(n)=m$ is
To prove (ii), note that, for $n=\prod_{i=1}^s p_i^{t_i}$ , the set $\mathcal{M}$ contains each prime factor $p_i$ $t_i$ times. By the iterative construction of $\mathcal{A}_3(n)$ , we need to run $\mathcal{A}_3(p_i)$ once every time we pick $p_i$ from $\mathcal{M}$ . Based on Theorem 1(ii), the expected number of flips for $\mathcal{A}_2(p_i)$ is ${p_i}/({1  a^{p_i}  b^{p_i}})$ , from which we conclude that the expected number of flips terminating $\mathcal{A}_3(n)$ is
To analyze the average running time of the algorithm $\mathcal{A}_3(n)$ , define the function $c(n)=\sum_{i=1}^s t_i p_i$ to be the sum of prime factors of n multiplied by their multiplicity, which is a good approximation to the average running time of $\mathcal{A}_3(n)$ according to Theorem 2(ii). We see that, for prime numbers, the complexity is linear. For composite numbers, the complexity is sublinear. For $n=p_1^{t_1}$ , since $c(n)=t_1 p_1$ , the average running time is almost $\log\!(n)$ . We have the following theorem from number theory [Reference Jakimczuk4, Corollary 2.11]:
So, we have an overall sublinear $\operatorname{O}\!(n/\log\!(n))$ complexity for the algorithm $\mathcal{A}_3(n)$ .
Remark 1. In [Reference Elias2], another method for generating a discrete uniform distribution on the set $\{0,\ldots ,n  1\}$ was proposed. Elias’ method needs Elias’ function mapping outcomes of the random source to target values. However, unlike Theorem 2(iii), the efficiency of Elias’ method is defined by complicated mathematical formulas without analytic and concise form, and is thus hard to analyze theoretically. Besides, Elias’ method suffers the same problem as Dijkstra’s method mentioned in Section 3. The computation of Elias’ function, an essential preprocessing step of Elias’ method, is computationally inefficient, and the storage of Elias’ function is also an excessive space cost.
Appendix A. A new proof for Theorem 1(i)
Consider random variables taking values in $\mathbb{Z}_p=\{ \bar{0}, \ldots, \overline{p1} \}$ , where $\overline{i}$ represents the residual class of i modulo p. Regard $\bar{0}$ as a tail and $\bar{1}$ as a head. Let X denote the outcome of a flip satisfying $\mathbb{P}(X=\overline{0})=a$ and $\mathbb{P}(X=\overline{1})=b$ . Let $X_0, \ldots, X_{p1}$ be independent copies of X. Define ${\boldsymbol{{X}}}^{p}=(X_0,\ldots, X_{p1})$ to be the outcome of p flips. We then have the following equivalences:
Also note that, for any permutation $\sigma$ , we have $(X_0,\ldots, X_{p1}) \overset{\textrm{d}}{=} (X_{\sigma(0)},\ldots, X_{\sigma(p1)})$ , since all the $X_i$ are i.i.d. In the following, we let $\sigma$ denote the special permutation
For fixed $t\neq \bar{0}\in \mathbb{Z}_{p}$ , we have
Note that any $t\neq \bar{0}$ can generate $\mathbb{Z}_{p}$ . By iterating the derivation above, we have
Summing over $t \neq \bar{0}$ on both sides of the above equation, we have, for $k,s\in \mathbb{Z}_{p}$ ,
which implies, for $k,s\in \mathbb{Z}_{p}$ ,
This equality is equivalent to the statement
for all $0\le k,s\le p1$ , as desired.
Acknowledgements
The author acknowledges Prof. Mei Wang at UChicago for helpful discussions and advice, and thanks PhD candidate Haoyu Wei at UCSD for useful suggestions and kind support. The author also acknowledges the editor of Journal of Applied Probability and the two anonymous referees for their valuable comments and remarks.
Funding information
There are no funding bodies to thank relating to the creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.