Cooperative Multi-Agent Planning with Adaptive Skill Synthesis

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address core challenges in multi-agent reinforcement learning (MARL) under partial observability—including low sample efficiency, poor policy interpretability, limited transferability, and non-Markovian inter-agent interactions—this paper proposes a decentralized closed-loop decision-making framework. Methodologically, it integrates vision-language models (VLMs) for cross-modal perception and reasoning; constructs an adaptive, dynamically evolving skill library initialized from expert demonstrations and guided by a hierarchical planner; and introduces an entity-aware, multi-hop structured communication mechanism that bridges large language model (LLM)-based reasoning with embodied MARL. Evaluated on symmetric scenarios in SMACv2, the approach achieves a 30% improvement in win rate over state-of-the-art methods. It further enables zero-shot generalization across tasks, exhibits strong policy interpretability through explicit skill composition and communication traces, and demonstrates robust cross-task transferability.

Technology Category

Application Category

📝 Abstract
Despite much progress in training distributed artificial intelligence (AI), building cooperative multi-agent systems with multi-agent reinforcement learning (MARL) faces challenges in sample efficiency, interpretability, and transferability. Unlike traditional learning-based methods that require extensive interaction with the environment, large language models (LLMs) demonstrate remarkable capabilities in zero-shot planning and complex reasoning. However, existing LLM-based approaches heavily rely on text-based observations and struggle with the non-Markovian nature of multi-agent interactions under partial observability. We present COMPASS, a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making. The skill library, bootstrapped from demonstrations, evolves via planner-guided tasks to enable adaptive strategies. COMPASS propagates entity information through multi-hop communication under partial observability. Evaluations on the improved StarCraft Multi-Agent Challenge (SMACv2) demonstrate COMPASS achieves up to 30% higher win rates than state-of-the-art MARL algorithms in symmetric scenarios.
Problem

Research questions and friction points this paper is trying to address.

Enhance cooperative multi-agent planning efficiency.
Improve interpretability and transferability in MARL.
Address non-Markovian interactions in partial observability.
Innovation

Methods, ideas, or system contributions that make the work stand out.

integrates vision-language models
dynamic skill library evolution
multi-hop communication propagation
🔎 Similar Papers
No similar papers found.