Probabilistic Programs of Thought

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the high GPU computational cost incurred by large language models when repeatedly sampling programs for code generation and mathematical reasoning. The authors propose a novel test-time framework that explicitly models the token-level output probabilities of the model as a probabilistic program, compactly representing an exponential number of deterministic programs in a structured form. By leveraging lightweight CPU-based probabilistic inference, the method enables efficient sampling without requiring additional calls to the large language model. This approach achieves significant performance improvements across benchmarks in code generation, code understanding, and mathematical reasoning while substantially reducing computational overhead.

Technology Category

Application Category

📝 Abstract

LLMs are widely used for code generation and mathematical reasoning tasks where they are required to generate structured output. They either need to reason about code, generate code for a given specification, or reason using programs of thought. The typical approach to code generation is to prompt the model and generate samples until an appropriate program is obtained. Within this process, sampling $n$ programs from the language model requires $n$ GPU compute-intensive generations which becomes prohibitively expensive for larger values of $n$. In this work, we address this limitation by exposing the LLM's distribution within the generated programs themselves. We propose a novel test-time framework we dub probabilistic programs of thought to obtain more samples from the model with fewer LLM generations. Given a program generated by a model and the associated next-token probabilities, we build a probabilistic program that compactly represents exponentially many deterministic programs. Since performing probabilistic reasoning in this probabilistic program is much cheaper, our approach allows sampling new programs without any additional GPU compute and little CPU overhead. We instantiate our approach on benchmarks for code generation, code understanding and mathematical reasoning and report improvements in performance with fewer generations from the LLM.

Problem

Research questions and friction points this paper is trying to address.

probabilistic programs

code generation

large language models

sampling efficiency

mathematical reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic Programs of Thought

LLM Sampling Efficiency

Next-Token Probabilities