DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models employing slow, deliberative reasoning often suffer from “overthinking” on simple tasks, resulting in redundant inference steps and excessive computational cost; conversely, uniformly truncating reasoning length degrades performance on complex tasks. To address this, we propose a difficulty-adaptive slow-thinking framework. Our approach introduces Token Length Budget (TLB)—a lightweight, token-based metric for quantifying problem difficulty—and integrates length-aware reward shaping with preference optimization to dynamically adjust chain-of-thought length in a task-specific manner. Empirically, our method preserves accuracy on complex tasks while reducing average inference token consumption by over 30%. This yields substantial improvements in inference efficiency and cross-task generalization. Crucially, the framework is scalable and does not require architectural modifications or task-specific fine-tuning, offering a principled, deployable paradigm for efficient large-model reasoning.

Technology Category

Application Category

📝 Abstract
Recent advancements in slow-thinking reasoning models have shown exceptional performance in complex reasoning tasks. However, these models often exhibit overthinking-generating redundant reasoning steps for simple problems, leading to excessive computational resource usage. While current mitigation strategies uniformly reduce reasoning tokens, they risk degrading performance on challenging tasks that require extended reasoning. This paper introduces Difficulty-Adaptive Slow-Thinking (DAST), a novel framework that enables models to autonomously adjust the length of Chain-of-Thought(CoT) based on problem difficulty. We first propose a Token Length Budget (TLB) metric to quantify difficulty, then leveraging length-aware reward shaping and length preference optimization to implement DAST. DAST penalizes overlong responses for simple tasks while incentivizing sufficient reasoning for complex problems. Experiments on diverse datasets and model scales demonstrate that DAST effectively mitigates overthinking (reducing token usage by over 30% on average) while preserving reasoning accuracy on complex problems.
Problem

Research questions and friction points this paper is trying to address.

Reduces redundant reasoning steps in simple tasks
Preserves reasoning accuracy in complex tasks
Autonomously adjusts reasoning length based on difficulty
Innovation

Methods, ideas, or system contributions that make the work stand out.

Difficulty-Adaptive Slow-Thinking (DAST) framework
Token Length Budget (TLB) metric
Length-aware reward shaping and optimization
🔎 Similar Papers
No similar papers found.
Y
Yi Shen
Unicom Data Intelligence, China Unicom; Data Science & Artifical Intelligence Research Institute, China Unicom
J
Jian Zhang
Unicom Data Intelligence, China Unicom; Data Science & Artifical Intelligence Research Institute, China Unicom
J
Jieyun Huang
Unicom Data Intelligence, China Unicom; Data Science & Artifical Intelligence Research Institute, China Unicom
Shuming Shi
Shuming Shi
Tencent AI Lab
NLPtext understandingknowledge miningtext generationweb search
W
Wenjing Zhang
Unicom Data Intelligence, China Unicom; Data Science & Artifical Intelligence Research Institute, China Unicom
J
Jiangze Yan
Unicom Data Intelligence, China Unicom; Data Science & Artifical Intelligence Research Institute, China Unicom
N
Ning Wang
Unicom Data Intelligence, China Unicom; Data Science & Artifical Intelligence Research Institute, China Unicom
K
Kai Wang
Unicom Data Intelligence, China Unicom; Data Science & Artifical Intelligence Research Institute, China Unicom
Shiguo Lian
Shiguo Lian
CloudMinds