🤖 AI Summary
This work addresses the problem of efficiently and uniformly sampling surjections from $[n]$ to $[k]$ without assuming that $n/k$ is constant. To this end, it introduces a novel framework based on the “profile” of a surjection—a structure encoding the frequency of preimage sizes—and reduces the task to sampling such profiles. Exploiting the fact that the number of distinct profile entries is at most $\sqrt{2n}$, the authors combine an exact-size Boltzmann sampler with a three-region partitioning strategy over the parameter space. This approach achieves efficient sampling across the entire range $n \geq k$, attaining expected output size within a logarithmic factor of the information-theoretic optimum. The method resolves an open question posed by Devroye et al. regarding random mapping profile sampling and offers both theoretical optimality and practical deployability.
📝 Abstract
In this article, we develop efficient sampling algorithms for random surjections from $[n]$ to $[k]$ for all $n \geq k$. We make no assumption about $n$ and $k$. In particular, we do not make the common assumption that the ratio $\frac{n}{k}$ is constant. All our guarantees are uniform in $n$ and $k$.
Our first insight is that all the complexity in sampling random surjections is captured by sampling a smaller structure which we call the \emph{profile} of the surjection. More precisely, the profile associates to each occurring preimage size $s$ the number of preimages of size $s$. Using standard techniques, we show that the problem of sampling surjections reduces to sampling the profile with the induced distribution. This is partly explained by the fact that profiles are always sublinear, with at most $\sqrt{2n}$ entries in the worst case.
We provide a complete set of algorithms to directly sample the \emph{profile} of a random surjection with the induced distribution, covering the full parameter space. These algorithms are shown to be optimal up to logarithmic factors in the expected size of the output. Our algorithms are based on exact-size Boltzmann samplers, which are standard rejection-based samplers. We partition the parameter space into three main regions. In each region, we optimize both the rejection rate and the cost of each sampling round.
Profiles capture a number of relevant statistics of random surjections and might be of independent interest. In a related context, profiles have been recently studied by Devroye et al. for random mappings. As a spin-off result, we answer an open question from Devroye and Los '25 by providing an optimal algorithm also for the profiles of a random mapping when $k > n/\log n$.
The results of this article are not only of theoretical interest but lead to samplers implementable in practice.