ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the uncontrolled computational overhead in large language model (LLM) chain-of-thought reasoning. We propose ThinkDial—the first open-source, end-to-end controllable reasoning framework—enabling dynamic switching among high-, medium-, and low-effort discrete reasoning modes. Methodologically, it integrates budget-aware supervised fine-tuning with a two-stage, budget-constrained reinforcement learning (RL) pipeline, incorporating bandwidth-aware RL and adaptive reward shaping to internalize controllability throughout the reasoning process. Experiments demonstrate that, relative to baseline full-effort inference: (i) the medium mode reduces token consumption by 50% with <10% performance degradation; (ii) the low mode achieves 75% token reduction with <15% degradation; and (iii) strong out-of-distribution generalization is observed. The core contribution is the first open implementation of a GPT-4o–style tunable reasoning mechanism, enabling precise trade-offs between inference cost and accuracy.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) with chain-of-thought reasoning have demonstrated remarkable problem-solving capabilities, but controlling their computational effort remains a significant challenge for practical deployment. Recent proprietary systems like OpenAI's gpt-oss series have introduced discrete operational modes for intuitive reasoning control, but the open-source community has largely failed to achieve such capabilities. In this paper, we introduce ThinkDial, the first open-recipe end-to-end framework that successfully implements gpt-oss-style controllable reasoning through discrete operational modes. Our system enables seamless switching between three distinct reasoning regimes: High mode (full reasoning capability), Medium mode (50 percent token reduction with <10 percent performance degradation), and Low mode (75 percent token reduction with <15 percent performance degradation). We achieve this through an end-to-end training paradigm that integrates budget-mode control throughout the entire pipeline: budget-mode supervised fine-tuning that embeds controllable reasoning capabilities directly into the learning process, and two-phase budget-aware reinforcement learning with adaptive reward shaping. Extensive experiments demonstrate that ThinkDial achieves target compression-performance trade-offs with clear response length reductions while maintaining performance thresholds. The framework also exhibits strong generalization capabilities on out-of-distribution tasks.
Problem

Research questions and friction points this paper is trying to address.

Controls computational effort in LLMs
Implements discrete operational reasoning modes
Reduces tokens while maintaining performance thresholds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-recipe end-to-end framework for controllable reasoning
Three distinct operational modes with token reduction
Budget-aware reinforcement learning with adaptive reward shaping
🔎 Similar Papers
No similar papers found.
Qianyu He
Qianyu He
Fudan University
Large Language ModelReasoningInstruction FollowingCreative Generation
S
Siyu Yuan
ByteDance Seed
X
Xuefeng Li
Shanghai Jiao Tong University
M
Mingxuan Wang
SIA-Lab of Tsinghua AIR and ByteDance Seed
Jiangjie Chen
Jiangjie Chen
ByteDance Seed
NLPMachine ReasoningLarge Language ModelsAutonomous Agent