CoLT: Reasoning with Chain of Latent Tool Calls

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing implicit reasoning approaches typically require modifications to the main model architecture and extensive training, limiting their generality and efficiency. This work proposes CoLT, a novel framework that formulates implicit reasoning as a triggerable tool call for the first time: the primary model generates seed tokens encoding reasoning cues, which are then decoded into full reasoning steps by a lightweight external model—eliminating the need for architectural changes to the main model. CoLT supports hidden state propagation, flexible decoder design, and is compatible with reinforcement learning–based training. Evaluated on four mathematical reasoning benchmarks, CoLT achieves higher accuracy while significantly reducing reasoning length, effectively balancing efficiency, generalizability, and explicit reasoning capability.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) is a critical technique in enhancing the reasoning ability of Large Language Models (LLMs), and latent reasoning methods have been proposed to accelerate the inefficient token-level reasoning chain. We notice that existing latent reasoning methods generally require model structure augmentation and exhaustive training, limiting their broader applicability. In this paper, we propose CoLT, a novel framework that implements latent reasoning as ``tool calls''. Instead of reasoning entirely in the latent space, CoLT generates seed tokens that contain information of a reasoning step. When a latent tool call is triggered, a smaller external model will take the hidden states of seed tokens as its input, and unpack the seed tokens back to a full reasoning step. In this way, we can ensure that the main model reasons in the explicit token space, preserving its ability while improving efficiency. Experimental results on four mathematical datasets demonstrate that CoLT achieves higher accuracy and shorter reasoning length than baseline latent models, and is compatible with reinforcement learning algorithms and different decoder structures.
Problem

Research questions and friction points this paper is trying to address.

latent reasoning
Chain-of-Thought
Large Language Models
reasoning efficiency
tool calls
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought
Latent Reasoning
Tool Calls
Efficient Inference
Large Language Models
🔎 Similar Papers
No similar papers found.
Fangwei Zhu
Fangwei Zhu
Peking University
Z
Zhifang Sui
School of Computer Science, State Key Laboratory of Multimedia Information Processing, Peking University