🤖 AI Summary
In mixed traffic, cooperative decision-making for connected autonomous vehicles (CAVs) faces two key challenges in multi-agent reinforcement learning (MARL): exploration-exploitation imbalance and state-space explosion. Method: This paper proposes a topology-enhanced MARL framework. Its core innovations include (i) constructing a dynamic traffic-flow game topology tensor to structurally compress the high-dimensional state space, and (ii) integrating visitation counting with inter-agent mutual information to explicitly model collaborative exploration preferences. Using QMIX as the baseline architecture, the method is evaluated across varying traffic densities and CAV penetration rates. Results: The approach significantly improves decision efficiency, safety, and trajectory smoothness; achieves a 12.7% increase in task completion rate; and attains decision rationality comparable to—or even surpassing—that of human drivers. It establishes a scalable paradigm for cooperative control in complex, dynamic traffic environments.
📝 Abstract
The exploration-exploitation trade-off constitutes one of the fundamental challenges in reinforcement learning (RL), which is exacerbated in multi-agent reinforcement learning (MARL) due to the exponential growth of joint state-action spaces. This paper proposes a topology-enhanced MARL (TPE-MARL) method for optimizing cooperative decision-making of connected and autonomous vehicles (CAVs) in mixed traffic. This work presents two primary contributions: First, we construct a game topology tensor for dynamic traffic flow, effectively compressing high-dimensional traffic state information and decrease the search space for MARL algorithms. Second, building upon the designed game topology tensor and using QMIX as the backbone RL algorithm, we establish a topology-enhanced MARL framework incorporating visit counts and agent mutual information. Extensive simulations across varying traffic densities and CAV penetration rates demonstrate the effectiveness of TPE-MARL. Evaluations encompassing training dynamics, exploration patterns, macroscopic traffic performance metrics, and microscopic vehicle behaviors reveal that TPE-MARL successfully balances exploration and exploitation. Consequently, it exhibits superior performance in terms of traffic efficiency, safety, decision smoothness, and task completion. Furthermore, the algorithm demonstrates decision-making rationality comparable to or exceeding that of human drivers in both mixed-autonomy and fully autonomous traffic scenarios. Code of our work is available at href{https://github.com/leoPub/tpemarl}{https://github.com/leoPub/tpemarl}.