🤖 AI Summary
Multi-agent systems struggle to efficiently allocate computational resources during inference under explicit budget constraints to support effective collaboration.
Method: This paper proposes a modular, collaboration-driven inference-time computation planning framework. It introduces a novel modular collaboration abstraction mechanism and a two-level planning architecture: an upper level that automatically extracts reusable multi-agent workflows via self-play reflection, and a lower level that jointly incorporates state awareness and multi-step lookahead prediction to dynamically optimize per-agent computational resource allocation under budget constraints.
Results: Evaluated on multiple challenging multi-agent benchmarks, our method significantly improves task success rates and demonstrates robust superiority across diverse explicit computational budgets. It is the first approach to achieve collaboration-oriented, budget-controllable, and generalizable inference-time computation scheduling.
📝 Abstract
Scaling test-time computation improves large language model performance without additional training. Recent work demonstrates that techniques such as repeated sampling, self-verification, and self-reflection can significantly enhance task success by allocating more inference-time compute. However, applying these techniques across multiple agents in a multi-agent system is difficult: there does not exist principled mechanisms to allocate compute to foster collaboration among agents, to extend test-time scaling to collaborative interactions, or to distribute compute across agents under explicit budget constraints. To address this gap, we propose FutureWeaver, a framework for planning and optimizing test-time compute allocation in multi-agent systems under fixed budgets. FutureWeaver introduces modularized collaboration, formalized as callable functions that encapsulate reusable multi-agent workflows. These modules are automatically derived through self-play reflection by abstracting recurring interaction patterns from past trajectories. Building on these modules, FutureWeaver employs a dual-level planning architecture that optimizes compute allocation by reasoning over the current task state while also speculating on future steps. Experiments on complex agent benchmarks demonstrate that FutureWeaver consistently outperforms baselines across diverse budget settings, validating its effectiveness for multi-agent collaboration in inference-time optimization.