D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of inertial reasoning in large reasoning models when handling complex tool-use tasks, often stemming from an inability to decompose tasks into manageable subtasks. To overcome this limitation, the authors propose D-CORE, a two-stage training framework that first employs self-distillation to elicit intrinsic task decomposition capabilities and then applies diversity-aware reinforcement learning to restore and enhance reflective reasoning. This study is the first to integrate explicit task decomposition with a compositional reasoning mechanism into the training of large reasoning models, significantly improving generalization. Experimental results demonstrate that D-CORE-8B achieves 77.7% accuracy on BFCLv3, outperforming the previous best 8B model by 5.7%, while D-CORE-14B sets a new state-of-the-art with 79.3% accuracy, surpassing even 70B-scale models.

Technology Category

Application Category

📝 Abstract
Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\underline{\textbf{D}}ecomposing tasks and \underline{\textbf{Co}}mposing \underline{\textbf{Re}}asoning processes) that first incentivize the LRMs'task decomposition reasoning capability via self-distillation, followed by diversity-aware reinforcement learning~(RL) to restore LRMs'reflective reasoning capability. D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Experiments on BFCLv3 demonstrate superiority of our method: D-CORE-8B reaches 77.7\% accuracy, surpassing the best-performing 8B model by 5.7\%. Meanwhile, D-CORE-14B establishes a new state-of-the-art at 79.3\%, outperforming 70B models despite being 5$\times$ smaller. The source code is available at https://github.com/alibaba/EfficientAI.
Problem

Research questions and friction points this paper is trying to address.

task decomposition
large reasoning models
complex tool use
lazy reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

task decomposition
self-distillation
diversity-aware reinforcement learning
tool use
large reasoning models
🔎 Similar Papers
No similar papers found.
B
Bowen Xu
Alibaba Cloud Computing, Alibaba Group
S
Shaoyu Wu
Alibaba Cloud Computing, Alibaba Group
Hao Jiang
Hao Jiang
Alibaba Group
LLM & AIGC
Kai Liu
Kai Liu
Unknown affiliation
X
Xin Chen
Alibaba Cloud Computing, Alibaba Group
L
Lulu Hu
Alibaba Cloud Computing, Alibaba Group
B
Bin Yang
Alibaba Cloud Computing, Alibaba Group