Teaching Language Models to Reason with Tools

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Large reasoning models (LRMs) frequently suffer from semantic and decision-level conflicts between their internal probabilistic reasoning and the deterministic computations of external tools (e.g., code interpreters), leading to inefficient or erroneous tool invocation in complex mathematical reasoning. To address this, we propose CoRT—a novel training framework—and a Hint-Engineering data synthesis strategy, which jointly model and optimize *when*, *how*, and *in what multi-turn interaction pattern* tools should be invoked. CoRT integrates rejection sampling with reinforcement learning for fine-tuning collaborative reasoning. Evaluated on five mathematical reasoning benchmarks, our approach yields absolute accuracy improvements of 4% (for 32B models) and 8% (for 1.5B models), while reducing token consumption by 30–50%. These gains significantly enhance both inference efficiency and deployment cost-effectiveness.

Technology Category

Application Category

📝 Abstract

Large reasoning models (LRMs) like OpenAI-o1 have shown impressive capabilities in natural language reasoning. However, these models frequently demonstrate inefficiencies or inaccuracies when tackling complex mathematical operations. While integrating computational tools such as Code Interpreters (CIs) offers a promising solution, it introduces a critical challenge: a conflict between the model's internal, probabilistic reasoning and the external, deterministic knowledge provided by the CI, which often leads models to unproductive deliberation. To overcome this, we introduce CoRT (Code-Optimized Reasoning Training), a post-training framework designed to teach LRMs to effectively utilize CIs. We propose emph{Hint-Engineering}, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths. This approach generates high-quality, code-integrated reasoning data specifically tailored to optimize LRM-CI interaction. Using this method, we have synthesized 30 high-quality samples to post-train models ranging from 1.5B to 32B parameters through supervised fine-tuning. CoRT further refines the multi-round interleaving of external CI usage and internal thinking by employing rejection sampling and reinforcement learning. Our experimental evaluations demonstrate CoRT's effectiveness, yielding absolute improvements of 4% and 8% on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B, respectively, across five challenging mathematical reasoning datasets. Moreover, CoRT significantly enhances efficiency, reducing token usage by approximately 30% for the 32B model and 50% for the 1.5B model compared to pure natural language reasoning baselines. The models and code are available at: https://github.com/ChengpengLi1003/CoRT.

Problem

Research questions and friction points this paper is trying to address.

Teaching language models to use computational tools effectively

Resolving conflicts between probabilistic reasoning and deterministic tools

Optimizing multi-round interaction between models and code interpreters

Innovation

Methods, ideas, or system contributions that make the work stand out.

CoRT framework teaches models to use tools effectively

Hint-Engineering injects strategic hints into reasoning paths

Rejection sampling optimizes tool usage and internal thinking

🔎 Similar Papers

Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?