Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address excessive computational overhead and low efficiency in complex reasoning tasks caused by lengthy chain-of-thought (CoT) generation in large language models (LLMs), this paper proposes DR.SAF—a Dynamic Reasoning Self-Aware Framework. DR.SAF abandons conventional paradigms relying on handcrafted difficulty priors and instead introduces three core mechanisms: boundary self-aware alignment, reinforcement learning–driven adaptive reward management, and inference path constraints—enabling LLMs to dynamically perceive and regulate their own reasoning depth. Experiments demonstrate a 49.27% reduction in total response tokens, a 6.59× improvement in token efficiency, a 5× decrease in training time, and over 16% accuracy gain under extreme training conditions. The key contribution is the first realization of endogenous, adaptive perception of reasoning difficulty by LLMs, coupled with controllable optimization of reasoning boundaries.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language models (LLMs) have greatly improved their capabilities on complex reasoning tasks through Long Chain-of-Thought (CoT). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. To improve the efficiency, current methods often rely on human-defined difficulty priors, which do not align with the LLM's self-awared difficulty, leading to inefficiencies. In this paper, we introduce the Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF), which enables models to dynamically assess and adjust their reasoning depth in response to problem complexity. DR. SAF integrates three key components: Boundary Self-Awareness Alignment, Adaptive Reward Management, and a Boundary Preservation Mechanism. These components allow models to optimize their reasoning processes, balancing efficiency and accuracy without compromising performance. Our experimental results demonstrate that DR. SAF achieves a 49.27% reduction in total response tokens with minimal loss in accuracy. The framework also delivers a 6.59x gain in token efficiency and a 5x reduction in training time, making it well-suited to resource-limited settings. During extreme training, DR. SAF can even surpass traditional instruction-based models in token efficiency with more than 16% accuracy improvement.
Problem

Research questions and friction points this paper is trying to address.

Reducing redundancy in LLM reasoning processes
Aligning model self-awareness with difficulty assessment
Dynamically adjusting reasoning depth for efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic reasoning depth adjustment for efficiency
Self-awareness alignment with adaptive reward management
Boundary preservation mechanism balancing accuracy efficiency
🔎 Similar Papers
No similar papers found.
Qiguang Chen
Qiguang Chen
Harbin Institute of Technology
Chain-of-ThoughtReasoningMultilingual LLMMulti-modal LLM
Dengyun Peng
Dengyun Peng
Harbin Institute of Technology
Jinhao Liu
Jinhao Liu
Harbin Institute of Technology
Chain-of-ThoughtReasoningNatural Language Processing
H
HuiKang Su
LARG, Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology
J
Jiannan Guan
LARG, Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology
L
Libo Qin
School of Computer Science and Engineering, Central South University
Wanxiang Che
Wanxiang Che
Professor of Harbin Institute of Technology
Natural Language Processing