Bridging the Capability Gap: Joint Alignment Tuning for Harmonizing LLM-based Multi-Agent Systems

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Existing LLM-based multi-agent systems suffer from capability imbalance and inefficient collaboration due to isolated fine-tuning of individual agents. To address this, we propose MOAT—a novel framework enabling joint alignment and co-optimization of planning agents and grounding (execution) agents for the first time. MOAT employs alternating optimization and phased alignment, coupled with subgoal sequence generation and a self-constructing mechanism for diverse subgoal–action pairs, ensuring non-decreasing progress and provably asymptotic convergence during training. Theoretical analysis establishes convergence guarantees, while empirical evaluation across six benchmarks demonstrates MOAT’s superiority over state-of-the-art methods: it achieves average improvements of 3.1% on in-distribution tasks and 4.4% on out-of-distribution tasks. MOAT thus introduces a formally grounded, scalable paradigm for cooperative multi-agent optimization.

Technology Category

Application Category

📝 Abstract

The advancement of large language models (LLMs) has enabled the construction of multi-agent systems to solve complex tasks by dividing responsibilities among specialized agents, such as a planning agent for subgoal generation and a grounding agent for executing tool-use actions. Most existing methods typically fine-tune these agents independently, leading to capability gaps among them with poor coordination. To address this, we propose MOAT, a Multi-Agent Joint Alignment Tuning framework that improves agents collaboration through iterative alignment. MOAT alternates between two key stages: (1) Planning Agent Alignment, which optimizes the planning agent to generate subgoal sequences that better guide the grounding agent; and (2) Grounding Agent Improving, which fine-tunes the grounding agent using diverse subgoal-action pairs generated by the agent itself to enhance its generalization capablity. Theoretical analysis proves that MOAT ensures a non-decreasing and progressively convergent training process. Experiments across six benchmarks demonstrate that MOAT outperforms state-of-the-art baselines, achieving average improvements of 3.1% on held-in tasks and 4.4% on held-out tasks.

Problem

Research questions and friction points this paper is trying to address.

Addressing capability gaps in multi-agent LLM systems

Improving coordination between planning and grounding agents

Enhancing generalization through joint alignment tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint alignment tuning for multi-agent systems

Iterative alignment between planning and grounding agents

Theoretical convergence and improved generalization capability

🔎 Similar Papers

MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs