ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the uncontrolled computational overhead in large language model (LLM) chain-of-thought reasoning. We propose ThinkDial—the first open-source, end-to-end controllable reasoning framework—enabling dynamic switching among high-, medium-, and low-effort discrete reasoning modes. Methodologically, it integrates budget-aware supervised fine-tuning with a two-stage, budget-constrained reinforcement learning (RL) pipeline, incorporating bandwidth-aware RL and adaptive reward shaping to internalize controllability throughout the reasoning process. Experiments demonstrate that, relative to baseline full-effort inference: (i) the medium mode reduces token consumption by 50% with <10% performance degradation; (ii) the low mode achieves 75% token reduction with <15% degradation; and (iii) strong out-of-distribution generalization is observed. The core contribution is the first open implementation of a GPT-4o–style tunable reasoning mechanism, enabling precise trade-offs between inference cost and accuracy.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) with chain-of-thought reasoning have demonstrated remarkable problem-solving capabilities, but controlling their computational effort remains a significant challenge for practical deployment. Recent proprietary systems like OpenAI's gpt-oss series have introduced discrete operational modes for intuitive reasoning control, but the open-source community has largely failed to achieve such capabilities. In this paper, we introduce ThinkDial, the first open-recipe end-to-end framework that successfully implements gpt-oss-style controllable reasoning through discrete operational modes. Our system enables seamless switching between three distinct reasoning regimes: High mode (full reasoning capability), Medium mode (50 percent token reduction with <10 percent performance degradation), and Low mode (75 percent token reduction with <15 percent performance degradation). We achieve this through an end-to-end training paradigm that integrates budget-mode control throughout the entire pipeline: budget-mode supervised fine-tuning that embeds controllable reasoning capabilities directly into the learning process, and two-phase budget-aware reinforcement learning with adaptive reward shaping. Extensive experiments demonstrate that ThinkDial achieves target compression-performance trade-offs with clear response length reductions while maintaining performance thresholds. The framework also exhibits strong generalization capabilities on out-of-distribution tasks.

Problem

Research questions and friction points this paper is trying to address.

Controls computational effort in LLMs

Implements discrete operational reasoning modes

Reduces tokens while maintaining performance thresholds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-recipe end-to-end framework for controllable reasoning

Three distinct operational modes with token reduction

Budget-aware reinforcement learning with adaptive reward shaping

🔎 Similar Papers

Rational Metareasoning for Large Language Models