🤖 AI Summary
In cooperative multi-agent reinforcement learning (MARL), inefficient exploration and misalignment between individual and collective exploration hinder sample efficiency and convergence. To address this, we propose the Agent-Identity-aware Adaptive Exploration framework (AIR), which jointly optimizes a trajectory identity classifier and an adversarial action selector to unify modeling of individual uncertainty and population-level behavioral diversity. AIR is the first approach to theoretically guarantee synergistic promotion of both individual and collective exploration via an adversarial mechanism—without introducing auxiliary networks or explicit policy-parameter modifications—and supports end-to-end integrable exploration control. Evaluated on multiple MARL benchmarks, AIR significantly improves sample efficiency and final performance, accelerating convergence by 37%–52% and demonstrating superior generalization over existing exploration methods.
📝 Abstract
Exploration in cooperative multi-agent reinforcement learning (MARL) remains challenging for value-based agents due to the absence of an explicit policy. Existing approaches include individual exploration based on uncertainty towards the system and collective exploration through behavioral diversity among agents. However, the introduction of additional structures often leads to reduced training efficiency and infeasible integration of these methods. In this paper, we propose Adaptive exploration via Identity Recognition~(AIR), which consists of two adversarial components: a classifier that recognizes agent identities from their trajectories, and an action selector that adaptively adjusts the mode and degree of exploration. We theoretically prove that AIR can facilitate both individual and collective exploration during training, and experiments also demonstrate the efficiency and effectiveness of AIR across various tasks.