๐ค AI Summary
To address the non-transferability of communication protocols and the difficulty of cross-task knowledge reuse in multi-task multi-agent deep reinforcement learning (MADRL), this paper proposes the Multi-task Communication Skills (MCS) framework. MCS employs a Transformer encoder to construct a task-agnostic, shared message space, enabling unified representation and cross-task transfer of communication skills; it further introduces a predictive network to explicitly model the messageโaction relationship, enhancing coordination between communication and policy learning. To the best of our knowledge, MCS is the first multi-task MADRL method supporting communication skill sharing and transfer. Evaluated on three multi-task benchmark environments, MCS significantly outperforms both non-communicative multi-task baselines and single-task baselines (with and without communication), achieving an average task performance improvement of 18.7% and a 32.4% gain in sample efficiency.
๐ Abstract
In multi-agent deep reinforcement learning (MADRL), agents can communicate with one another to perform a task in a coordinated manner. When multiple tasks are involved, agents can also leverage knowledge from one task to improve learning in other tasks. In this paper, we propose Multi-task Communication Skills (MCS), a MADRL with communication method that learns and performs multiple tasks simultaneously, with agents interacting through learnable communication protocols. MCS employs a Transformer encoder to encode task-specific observations into a shared message space, capturing shared communication skills among agents. To enhance coordination among agents, we introduce a prediction network that correlates messages with the actions of sender agents in each task. We adapt three multi-agent benchmark environments to multi-task settings, where the number of agents as well as the observation and action spaces vary across tasks. Experimental results demonstrate that MCS achieves better performance than multi-task MADRL baselines without communication, as well as single-task MADRL baselines with and without communication.