🤖 AI Summary
This work addresses the fundamental trade-off between reasoning efficiency and robustness faced by large language model agents in complex tasks. To this end, we propose a self-driven collaborative reasoning framework that eliminates external routing modules and instead leverages agents’ intrinsic reflection signals to dynamically coordinate models of varying capability levels. The framework invokes stronger—yet more costly—models only when necessary and allocates reasoning budgets through a difficulty-aware, cumulative escalation strategy, thereby achieving stable and efficient long-horizon reasoning. Empirical evaluations across multiple multi-step agent benchmarks demonstrate that our approach significantly advances the Pareto frontier of accuracy and efficiency.
📝 Abstract
Autonomous agents powered by large language models (LLMs) perform complex tasks through long-horizon reasoning and tool interaction, where a fundamental trade-off arises between execution efficiency and reasoning robustness. Models at different capability-cost levels offer complementary advantages: lower-cost models enable fast execution but may struggle on difficult reasoning segments, while stronger models provide more robust reasoning at higher computational cost. We present AgentCollab, a self-driven collaborative inference framework that dynamically coordinates models with different reasoning capacities during agent execution. Instead of relying on external routing modules, the framework uses the agent's own self-reflection signal to determine whether the current reasoning trajectory is making meaningful progress, and escalates control to a stronger reasoning tier only when necessary. To further stabilize long-horizon execution, we introduce a difficulty-aware cumulative escalation strategy that allocates additional reasoning budget based on recent failure signals. In our experiments, we instantiate this framework using a two-level small-large model setting. Experiments on diverse multi-step agent benchmarks show that AgentCollab consistently improves the accuracy-efficiency Pareto frontier of LLM agents.