Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the suboptimal performance of large reasoning models, which often stems from computational redundancy and unfaithful reasoning, while existing behavior-shaping approaches rely on costly supervision from ground-truth reasoning trajectories. The study introduces a novel method that uncovers and leverages the model’s internal “reasoning beliefs,” extracted via logit probing, and aligns them with a target belief blueprint through lightweight fine-tuning on synthetically generated self-reflective question-answer pairs. Notably, this approach requires no supervision from real reasoning trajectories yet effectively steers reasoning behavior, achieving reasoning efficiency and faithfulness comparable to—or even surpassing—that of current methods based on behavioral supervision or preference learning, all while substantially reducing training costs.

Technology Category

Application Category

📝 Abstract
Large reasoning models (LRMs) have achieved remarkable success in complex problem-solving, yet they often suffer from computational redundancy or reasoning unfaithfulness. Current methods for shaping LRM behavior typically rely on reinforcement learning or fine-tuning with gold-standard reasoning traces, a paradigm that is both computationally expensive and difficult to scale. In this paper, we reveal that LRMs possess latent \textit{reasoning beliefs} that internally track their own reasoning traits, which can be captured through simple logit probing. Building upon this insight, we propose Reasoning Belief Engineering (RELIEF), a simple yet effective framework that shapes LRM behavior by aligning the model's self-concept with a target belief blueprint. Crucially, RELIEF completely bypasses the need for reasoning-trace supervision. It internalizes desired traits by fine-tuning on synthesized, self-reflective question-answering pairs that affirm the target belief. Extensive experiments on efficiency and faithfulness tasks demonstrate that RELIEF matches or outperforms behavior-supervised and preference-based baselines while requiring lower training costs. Further analysis validates that shifting a model's reasoning belief effectively shapes its actual behavior.
Problem

Research questions and friction points this paper is trying to address.

reasoning redundancy
reasoning unfaithfulness
reasoning supervision
large reasoning models
behavior shaping
Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning belief
belief engineering
logit probing
self-reflective fine-tuning
reasoning supervision-free
🔎 Similar Papers
No similar papers found.