FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Multi-agent systems struggle to efficiently allocate computational resources during inference under explicit budget constraints to support effective collaboration. Method: This paper proposes a modular, collaboration-driven inference-time computation planning framework. It introduces a novel modular collaboration abstraction mechanism and a two-level planning architecture: an upper level that automatically extracts reusable multi-agent workflows via self-play reflection, and a lower level that jointly incorporates state awareness and multi-step lookahead prediction to dynamically optimize per-agent computational resource allocation under budget constraints. Results: Evaluated on multiple challenging multi-agent benchmarks, our method significantly improves task success rates and demonstrates robust superiority across diverse explicit computational budgets. It is the first approach to achieve collaboration-oriented, budget-controllable, and generalizable inference-time computation scheduling.

Technology Category

Application Category

📝 Abstract

Scaling test-time computation improves large language model performance without additional training. Recent work demonstrates that techniques such as repeated sampling, self-verification, and self-reflection can significantly enhance task success by allocating more inference-time compute. However, applying these techniques across multiple agents in a multi-agent system is difficult: there does not exist principled mechanisms to allocate compute to foster collaboration among agents, to extend test-time scaling to collaborative interactions, or to distribute compute across agents under explicit budget constraints. To address this gap, we propose FutureWeaver, a framework for planning and optimizing test-time compute allocation in multi-agent systems under fixed budgets. FutureWeaver introduces modularized collaboration, formalized as callable functions that encapsulate reusable multi-agent workflows. These modules are automatically derived through self-play reflection by abstracting recurring interaction patterns from past trajectories. Building on these modules, FutureWeaver employs a dual-level planning architecture that optimizes compute allocation by reasoning over the current task state while also speculating on future steps. Experiments on complex agent benchmarks demonstrate that FutureWeaver consistently outperforms baselines across diverse budget settings, validating its effectiveness for multi-agent collaboration in inference-time optimization.

Problem

Research questions and friction points this paper is trying to address.

Allocating compute to foster collaboration among agents

Extending test-time scaling to collaborative interactions

Distributing compute across agents under budget constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modularized collaboration through reusable multi-agent workflows

Dual-level planning architecture for compute allocation optimization

Self-play reflection to abstract recurring interaction patterns

🔎 Similar Papers

No similar papers found.