Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address excessive computational overhead and low efficiency in complex reasoning tasks caused by lengthy chain-of-thought (CoT) generation in large language models (LLMs), this paper proposes DR.SAF—a Dynamic Reasoning Self-Aware Framework. DR.SAF abandons conventional paradigms relying on handcrafted difficulty priors and instead introduces three core mechanisms: boundary self-aware alignment, reinforcement learning–driven adaptive reward management, and inference path constraints—enabling LLMs to dynamically perceive and regulate their own reasoning depth. Experiments demonstrate a 49.27% reduction in total response tokens, a 6.59× improvement in token efficiency, a 5× decrease in training time, and over 16% accuracy gain under extreme training conditions. The key contribution is the first realization of endogenous, adaptive perception of reasoning difficulty by LLMs, coupled with controllable optimization of reasoning boundaries.

Technology Category

Application Category

📝 Abstract

Recent advancements in large language models (LLMs) have greatly improved their capabilities on complex reasoning tasks through Long Chain-of-Thought (CoT). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. To improve the efficiency, current methods often rely on human-defined difficulty priors, which do not align with the LLM's self-awared difficulty, leading to inefficiencies. In this paper, we introduce the Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF), which enables models to dynamically assess and adjust their reasoning depth in response to problem complexity. DR. SAF integrates three key components: Boundary Self-Awareness Alignment, Adaptive Reward Management, and a Boundary Preservation Mechanism. These components allow models to optimize their reasoning processes, balancing efficiency and accuracy without compromising performance. Our experimental results demonstrate that DR. SAF achieves a 49.27% reduction in total response tokens with minimal loss in accuracy. The framework also delivers a 6.59x gain in token efficiency and a 5x reduction in training time, making it well-suited to resource-limited settings. During extreme training, DR. SAF can even surpass traditional instruction-based models in token efficiency with more than 16% accuracy improvement.

Problem

Research questions and friction points this paper is trying to address.

Reducing redundancy in LLM reasoning processes

Aligning model self-awareness with difficulty assessment

Dynamically adjusting reasoning depth for efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic reasoning depth adjustment for efficiency

Self-awareness alignment with adaptive reward management

Boundary preservation mechanism balancing accuracy efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow