Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This paper investigates whether large language models (LLMs) possess “off-track reasoning” capability—the ability to either recover from misleading reasoning steps within shared inference trajectories (recoverability) or collaboratively construct correct reasoning guided by stronger models (guidability). To this end, the authors formally define this capability and introduce a dual-dimensional benchmark—Recoverability & Guidability—evaluating 15 open-source LLMs (1.5B–32B) systematically. Ablation studies reveal that current single-model inference training paradigms severely hinder collaborative reasoning; stronger models are paradoxically more susceptible to interference; and all models achieve <9.2% success rate in guiding reasoning on tasks exceeding their individual capabilities. The work uncovers fundamental bottlenecks in multi-model collaborative reasoning and provides both theoretical grounding and empirical evidence for developing collaboration-aware reasoning training paradigms.

Technology Category

Application Category

📝 Abstract

Reasoning LLMs are trained to verbalize their reasoning process, yielding strong gains on complex tasks. This transparency also opens a promising direction: multiple reasoners can directly collaborate on each other's thinking within a shared trajectory, yielding better inference efficiency and exploration. A key prerequisite, however, is the ability to assess the usefulness and build on another model's partial thinking -- we call this off-trajectory reasoning. Our paper investigates a critical question: can standard solo-reasoning training pipelines deliver desired off-trajectory behaviors? We propose twin tests that capture the two extremes of the off-trajectory spectrum, namely Recoverability, which tests whether LLMs can backtrack from "distractions" induced by misleading reasoning traces, and Guidability, which tests their ability to build upon correct reasoning from stronger collaborators. Our study evaluates 15 open-weight LLMs (1.5B-32B) and reveals a counterintuitive finding -- "stronger" LLMs on benchmarks are often more fragile under distraction. Moreover, all models tested fail to effectively leverage guiding steps from collaborators on problems beyond their inherent capabilities with solve rates remaining under 9.2%. Finally, we conduct control studies to isolate the effects of three factors in post-training on these behaviors: the choice of distillation teacher, the use of RL, and data selection strategy. Our results provide actionable insights for training natively strong reasoning collaborators; e.g., we find that suboptimal recoverability behaviors of teacher models are transferred to distilled students even if the distillation trajectories are correct. Taken together, this work lays the groundwork for evaluating multi-model collaborations in shared reasoning trajectories and highlights the limitations of off-the-shelf reasoning LLMs.

Problem

Research questions and friction points this paper is trying to address.

Investigating whether LLMs can collaborate on shared reasoning trajectories

Testing LLMs' ability to backtrack from misleading reasoning traces

Evaluating LLMs' capacity to build upon collaborators' correct reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLMs' ability to backtrack from misleading reasoning traces

Testing LLMs' capacity to build upon correct reasoning from collaborators

Analyzing post-training factors affecting multi-model reasoning collaboration

🔎 Similar Papers

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks

2024-07-13arXiv.orgCitations: 3

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

AI Research Scientist, VLM (vision language models)