Variational Reasoning for Language Models

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of enhancing language models’ reasoning capabilities. We propose a novel variational inference (VI)-based framework that treats chain-of-thought (CoT) reasoning as a latent variable and optimizes the evidence lower bound (ELBO) using a forward KL-divergence-based multi-trajectory variational objective. Our method unifies rejection sampling and binary-reward reinforcement learning (e.g., GRPO) under a probabilistic lens, revealing their implicit importance-weighting mechanisms and exposing a systematic bias toward simpler problems. Compared to standard RL approaches, our framework offers improved training stability and a fully differentiable objective. Experiments on Qwen-2.5 and Qwen-3 models demonstrate consistent and significant performance gains across diverse reasoning tasks—including mathematical and commonsense reasoning—while providing an interpretable, probabilistic modeling paradigm for large language model reasoning.

Technology Category

Application Category

📝 Abstract

We introduce a variational reasoning framework for language models that treats thinking traces as latent variables and optimizes them through variational inference. Starting from the evidence lower bound (ELBO), we extend it to a multi-trace objective for tighter bounds and propose a forward-KL formulation that stabilizes the training of the variational posterior. We further show that rejection sampling finetuning and binary-reward RL, including GRPO, can be interpreted as local forward-KL objectives, where an implicit weighting by model accuracy naturally arises from the derivation and reveals a previously unnoticed bias toward easier questions. We empirically validate our method on the Qwen 2.5 and Qwen 3 model families across a wide range of reasoning tasks. Overall, our work provides a principled probabilistic perspective that unifies variational inference with RL-style methods and yields stable objectives for improving the reasoning ability of language models. Our code is available at https://github.com/sail-sg/variational-reasoning.

Problem

Research questions and friction points this paper is trying to address.

Optimizes thinking traces as latent variables

Stabilizes variational posterior training with forward-KL

Unifies variational inference with RL-style reasoning methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Treats thinking traces as latent variables

Extends ELBO to multi-trace objective

Uses forward-KL formulation for stable training

🔎 Similar Papers

No similar papers found.