Bridging the Capability Gap: Joint Alignment Tuning for Harmonizing LLM-based Multi-Agent Systems

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based multi-agent systems suffer from capability imbalance and inefficient collaboration due to isolated fine-tuning of individual agents. To address this, we propose MOAT—a novel framework enabling joint alignment and co-optimization of planning agents and grounding (execution) agents for the first time. MOAT employs alternating optimization and phased alignment, coupled with subgoal sequence generation and a self-constructing mechanism for diverse subgoal–action pairs, ensuring non-decreasing progress and provably asymptotic convergence during training. Theoretical analysis establishes convergence guarantees, while empirical evaluation across six benchmarks demonstrates MOAT’s superiority over state-of-the-art methods: it achieves average improvements of 3.1% on in-distribution tasks and 4.4% on out-of-distribution tasks. MOAT thus introduces a formally grounded, scalable paradigm for cooperative multi-agent optimization.

Technology Category

Application Category

📝 Abstract
The advancement of large language models (LLMs) has enabled the construction of multi-agent systems to solve complex tasks by dividing responsibilities among specialized agents, such as a planning agent for subgoal generation and a grounding agent for executing tool-use actions. Most existing methods typically fine-tune these agents independently, leading to capability gaps among them with poor coordination. To address this, we propose MOAT, a Multi-Agent Joint Alignment Tuning framework that improves agents collaboration through iterative alignment. MOAT alternates between two key stages: (1) Planning Agent Alignment, which optimizes the planning agent to generate subgoal sequences that better guide the grounding agent; and (2) Grounding Agent Improving, which fine-tunes the grounding agent using diverse subgoal-action pairs generated by the agent itself to enhance its generalization capablity. Theoretical analysis proves that MOAT ensures a non-decreasing and progressively convergent training process. Experiments across six benchmarks demonstrate that MOAT outperforms state-of-the-art baselines, achieving average improvements of 3.1% on held-in tasks and 4.4% on held-out tasks.
Problem

Research questions and friction points this paper is trying to address.

Addressing capability gaps in multi-agent LLM systems
Improving coordination between planning and grounding agents
Enhancing generalization through joint alignment tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint alignment tuning for multi-agent systems
Iterative alignment between planning and grounding agents
Theoretical convergence and improved generalization capability
🔎 Similar Papers
No similar papers found.