Effectively Controlling Reasoning Models through Thinking Intervention

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address the coarse-grained behavioral control and lack of internal process intervention in reasoning-augmented large language models (LLMs), this paper proposes the “Thought Intervention” paradigm: for the first time, it shifts the control point from the input prompt layer to the model’s internal reasoning chain, enabling token-level precise identification and dynamic revision of critical thought tokens to explicitly, editably, and interpretable guide intermediate reasoning steps. The method requires no fine-tuning or additional training, ensuring both lightweight deployment and broad applicability. Evaluated on four benchmarks—IFEval, SEP, XSTest, and SORRY-Bench—it achieves improvements of +6.7% in instruction-following accuracy, +15.4% in instruction-level understanding, and +40.0% in refusal rate for unsafe requests (using DeepSeek R1), significantly advancing the frontier of fine-grained reasoning control.

Technology Category

Application Category

📝 Abstract

Reasoning-enhanced large language models (LLMs) explicitly generate intermediate reasoning steps prior to generating final answers, helping the model excel in complex problem-solving. In this paper, we demonstrate that this emerging generation framework offers a unique opportunity for more fine-grained control over model behavior. We propose Thinking Intervention, a novel paradigm designed to explicitly guide the internal reasoning processes of LLMs by strategically inserting or revising specific thinking tokens. We conduct comprehensive evaluations across multiple tasks, including instruction following on IFEval, instruction hierarchy on SEP, and safety alignment on XSTest and SORRY-Bench. Our results demonstrate that Thinking Intervention significantly outperforms baseline prompting approaches, achieving up to 6.7% accuracy gains in instruction-following scenarios, 15.4% improvements in reasoning about instruction hierarchies, and a 40.0% increase in refusal rates for unsafe prompts using open-source DeepSeek R1 models. Overall, our work opens a promising new research avenue for controlling reasoning LLMs.

Problem

Research questions and friction points this paper is trying to address.

Control reasoning steps in large language models

Improve model behavior via thinking token intervention

Enhance accuracy and safety in complex problem-solving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Guides reasoning via strategic token insertion

Enhances control over model's internal processes

Improves accuracy and safety in LLM outputs

🔎 Similar Papers

No similar papers found.