🤖 AI Summary
Large language models (LLMs) accurately describe probability distributions yet suffer from sampling infidelity, hindering their use in Monte Carlo simulation and stochastic decision-making—tasks demanding reliable randomness. This paper addresses systematic bias in Bernoulli sampling and proposes Verbalized Rejection Sampling (VRS), the first prompt-based reformulation of classical rejection sampling as an explicit, stepwise reasoning chain involving “accept/reject” decisions to enhance sampling fidelity. VRS requires no fine-tuning or access to model logits; it operates solely via prompting. We derive a theoretical upper bound on sampling error and prove that, under mild assumptions, VRS strictly outperforms direct sampling. Extensive experiments across multiple state-of-the-art LLMs demonstrate its effectiveness: average bias reduction exceeds 60% on coin-flip–style tasks.
📝 Abstract
Large language models (LLMs) can often accurately describe probability distributions using natural language, yet they still struggle to generate faithful samples from them. This mismatch limits their use in tasks requiring reliable stochasticity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making. We investigate this gap between knowledge and sampling in the context of Bernoulli distributions. We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling that prompts the LLM to reason about and accept or reject proposed samples. Despite relying on the same Bernoulli mechanism internally, VRS substantially reduces sampling bias across models. We provide theoretical analysis showing that, under mild assumptions, VRS improves over direct sampling, with gains attributable to both the algorithm and prompt design. More broadly, our results show how classical probabilistic tools can be verbalized and embedded into LLM workflows to improve reliability, without requiring access to model internals or heavy prompt engineering.