Agentic-R1: Distilled Dual-Strategy Reasoning

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing long-chain-of-thought (long-CoT) models rely on error-prone natural language reasoning, while tool-augmented agents excel at arithmetic execution but struggle with complex logical reasoning. Method: We propose DualDistill, a novel training framework that enables dual-strategy collaborative distillation for the first time. It fuses reasoning traces from heterogeneous teacher models—text-based logical reasoners and code-execution-based calculators—to train a unified student model capable of dynamically selecting the optimal solving path per query: invoking external tools for computation-intensive tasks and applying natural language reasoning for abstract logical problems. Contribution/Results: DualDistill integrates knowledge distillation, tool augmentation, and dynamic strategy selection into a single efficient architecture. Evaluated on GSM8K, MATH, and ProofWriter, it achieves significant improvements in both accuracy and inference efficiency, demonstrating the effectiveness of adaptive multi-strategy fusion for enhancing robustness in mathematical and logical reasoning.

Technology Category

Application Category

📝 Abstract

Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning. Our project is available at https://github.com/StigLidu/DualDistill

Problem

Research questions and friction points this paper is trying to address.

Slow and error-prone natural language reasoning in long-CoT models

Tool-augmented agents struggle with complex logical tasks

Need for unified model handling arithmetic and abstract reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

DualDistill framework distills multiple reasoning strategies

Agentic-R1 dynamically selects optimal reasoning strategy

Combines tool execution and text-based reasoning

🔎 Similar Papers

No similar papers found.

Authors to Follow