Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Large language models (LLMs) accurately describe probability distributions yet suffer from sampling infidelity, hindering their use in Monte Carlo simulation and stochastic decision-making—tasks demanding reliable randomness. This paper addresses systematic bias in Bernoulli sampling and proposes Verbalized Rejection Sampling (VRS), the first prompt-based reformulation of classical rejection sampling as an explicit, stepwise reasoning chain involving “accept/reject” decisions to enhance sampling fidelity. VRS requires no fine-tuning or access to model logits; it operates solely via prompting. We derive a theoretical upper bound on sampling error and prove that, under mild assumptions, VRS strictly outperforms direct sampling. Extensive experiments across multiple state-of-the-art LLMs demonstrate its effectiveness: average bias reduction exceeds 60% on coin-flip–style tasks.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) can often accurately describe probability distributions using natural language, yet they still struggle to generate faithful samples from them. This mismatch limits their use in tasks requiring reliable stochasticity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making. We investigate this gap between knowledge and sampling in the context of Bernoulli distributions. We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling that prompts the LLM to reason about and accept or reject proposed samples. Despite relying on the same Bernoulli mechanism internally, VRS substantially reduces sampling bias across models. We provide theoretical analysis showing that, under mild assumptions, VRS improves over direct sampling, with gains attributable to both the algorithm and prompt design. More broadly, our results show how classical probabilistic tools can be verbalized and embedded into LLM workflows to improve reliability, without requiring access to model internals or heavy prompt engineering.

Problem

Research questions and friction points this paper is trying to address.

Reducing bias in LLM-generated random samples

Improving LLM sampling fidelity for Bernoulli distributions

Enhancing stochastic task reliability without internal access

Innovation

Methods, ideas, or system contributions that make the work stand out.

Verbalized Rejection Sampling reduces bias

Natural-language adaptation of classical sampling

Improves reliability without model internals

🔎 Similar Papers

No similar papers found.