Contrastive learning-based agent modeling for deep reinforcement learning

📅 2023-12-30

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Existing multi-agent reinforcement learning (MARL) approaches for policy modeling rely either on agents’ local observations or long interaction trajectories—limiting scalability and generalizability. To address this, we propose Contrastive Learning-based Agent Modeling (CLAM), a framework that models other agents’ policies using only the ego agent’s own observations. CLAM eliminates dependence on the modeled agents’ observation data and instead employs contrastive learning to generate highly consistent and transferable policy representations in real time at the start of each episode. By tightly integrating policy representation learning with deep reinforcement learning, CLAM enables online, prior-free adaptive modeling without requiring explicit interaction history. Evaluated on cooperative and competitive MARL benchmarks, CLAM achieves state-of-the-art performance, significantly outperforming existing methods while demonstrating superior policy generalization and cross-scenario adaptability.

Technology Category

Application Category

📝 Abstract

Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems, as this is the means by which the ego agent understands other agents' behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent's adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from other agents (modeled agents) during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. Our experiments demonstrate that our approach achieves state-of-the-art on both cooperative and competitive tasks, highlighting the potential of contrastive learning-based agent modeling for enhancing reinforcement learning.

Problem

Research questions and friction points this paper is trying to address.

Modeling diverse agent behaviors without local observations from others

Improving real-time policy representation quality from ego agent observations

Enhancing reinforcement learning in cooperative and competitive multi-agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning for agent modeling

Uses only ego agent's local observations

Real-time high-quality policy representations

🔎 Similar Papers

No similar papers found.