Agentic-R1: Distilled Dual-Strategy Reasoning

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing long-chain-of-thought (long-CoT) models rely on error-prone natural language reasoning, while tool-augmented agents excel at arithmetic execution but struggle with complex logical reasoning. Method: We propose DualDistill, a novel training framework that enables dual-strategy collaborative distillation for the first time. It fuses reasoning traces from heterogeneous teacher models—text-based logical reasoners and code-execution-based calculators—to train a unified student model capable of dynamically selecting the optimal solving path per query: invoking external tools for computation-intensive tasks and applying natural language reasoning for abstract logical problems. Contribution/Results: DualDistill integrates knowledge distillation, tool augmentation, and dynamic strategy selection into a single efficient architecture. Evaluated on GSM8K, MATH, and ProofWriter, it achieves significant improvements in both accuracy and inference efficiency, demonstrating the effectiveness of adaptive multi-strategy fusion for enhancing robustness in mathematical and logical reasoning.

Technology Category

Application Category

📝 Abstract
Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning. Our project is available at https://github.com/StigLidu/DualDistill
Problem

Research questions and friction points this paper is trying to address.

Slow and error-prone natural language reasoning in long-CoT models
Tool-augmented agents struggle with complex logical tasks
Need for unified model handling arithmetic and abstract reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

DualDistill framework distills multiple reasoning strategies
Agentic-R1 dynamically selects optimal reasoning strategy
Combines tool execution and text-based reasoning
🔎 Similar Papers
No similar papers found.
Weihua Du
Weihua Du
LTI, Carnegie Mellon University
language modelsreinforcement learningembodied AI
P
Pranjal Aggarwal
Language Technologies Institute, Carnegie Mellon University
S
Sean Welleck
Language Technologies Institute, Carnegie Mellon University
Y
Yiming Yang
Language Technologies Institute, Carnegie Mellon University