Multi-agent In-context Coordination via Decentralized Memory Retrieval

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In cooperative multi-agent reinforcement learning (MARL), decentralized execution hinders task alignment, leads to inaccurate credit assignment, and impairs cross-task generalization. To address these challenges, this paper proposes Context-MARL: (1) a centralized trajectory embedding model based on Transformers that explicitly encodes team-level task context; (2) a decentralized policy network augmented with a neighbor-aware memory retrieval mechanism, enabling online integration of real-time observations and offline team memory; and (3) a team-individual hybrid utility scoring function that enables dynamic credit assignment and context-aware decision-making. Evaluated on the Level-Based Foraging (LBF) and StarCraft Multi-Agent Challenge (SMAC) benchmarks, Context-MARL significantly accelerates task adaptation and improves collaborative efficiency. Moreover, it achieves superior cross-task generalization compared to existing state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Large transformer models, trained on diverse datasets, have demonstrated impressive few-shot performance on previously unseen tasks without requiring parameter updates. This capability has also been explored in Reinforcement Learning (RL), where agents interact with the environment to retrieve context and maximize cumulative rewards, showcasing strong adaptability in complex settings. However, in cooperative Multi-Agent Reinforcement Learning (MARL), where agents must coordinate toward a shared goal, decentralized policy deployment can lead to mismatches in task alignment and reward assignment, limiting the efficiency of policy adaptation. To address this challenge, we introduce Multi-agent In-context Coordination via Decentralized Memory Retrieval (MAICC), a novel approach designed to enhance coordination by fast adaptation. Our method involves training a centralized embedding model to capture fine-grained trajectory representations, followed by decentralized models that approximate the centralized one to obtain team-level task information. Based on the learned embeddings, relevant trajectories are retrieved as context, which, combined with the agents'current sub-trajectories, inform decision-making. During decentralized execution, we introduce a novel memory mechanism that effectively balances test-time online data with offline memory. Based on the constructed memory, we propose a hybrid utility score that incorporates both individual- and team-level returns, ensuring credit assignment across agents. Extensive experiments on cooperative MARL benchmarks, including Level-Based Foraging (LBF) and SMAC (v1/v2), show that MAICC enables faster adaptation to unseen tasks compared to existing methods. Code is available at https://github.com/LAMDA-RL/MAICC.
Problem

Research questions and friction points this paper is trying to address.

Addresses coordination mismatches in decentralized multi-agent reinforcement learning systems
Enhances policy adaptation through decentralized memory retrieval and hybrid utility scoring
Improves team-level task alignment and credit assignment for cooperative agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized memory retrieval for multi-agent coordination
Hybrid utility score balancing individual and team returns
Centralized embedding model with decentralized policy approximation
🔎 Similar Papers
No similar papers found.
T
Tao Jiang
National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
Zichuan Lin
Zichuan Lin
Tencent
Reinforcement Learning
Lihe Li
Lihe Li
National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
Y
Yichen Li
National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
C
Cong Guan
National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
L
Lei Yuan
National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
Zongzhang Zhang
Zongzhang Zhang
Nanjing University
Artificial IntelligenceReinforcement LearningProbabilistic PlanningMulti-Agent Systems
Y
Yang Yu
National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
Deheng Ye
Deheng Ye
Director of AI, Tencent
Applied machine learning