Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer

📅 2025-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent reinforcement learning (MARL), Q-value overestimation intensifies with increasing agent count, undermining training stability. Existing approaches address overestimation only during target Q-value estimation, neglecting its accumulation during online Q-network optimization. This paper proposes a dual-path collaborative suppression framework—“estimation algorithm–optimization process”—to jointly mitigate overestimation. We introduce an iterative estimation-optimization analysis paradigm; extend stochastic ensemble methods for both individual and global Q-value target estimation; and design a hypernetwork regularizer to constrain weights and biases of the online global Q-network, thereby suppressing overestimation accumulation in optimization. Evaluated on multiple tasks from the Multi-Agent Particle Environment (MPE) and StarCraft Multi-Agent Challenge (SMAC), our method significantly reduces Q-value overestimation, improves training stability, and enhances final policy performance—demonstrating its effectiveness, generalizability, and conceptual novelty.

Technology Category

Application Category

📝 Abstract
Overestimation in single-agent reinforcement learning has been extensively studied. In contrast, overestimation in the multiagent setting has received comparatively little attention although it increases with the number of agents and leads to severe learning instability. Previous works concentrate on reducing overestimation in the estimation process of target Q-value. They ignore the follow-up optimization process of online Q-network, thus making it hard to fully address the complex multiagent overestimation problem. To solve this challenge, in this study, we first establish an iterative estimation-optimization analysis framework for multiagent value-mixing Q-learning. Our analysis reveals that multiagent overestimation not only comes from the computation of target Q-value but also accumulates in the online Q-network's optimization. Motivated by it, we propose the Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer algorithm to tackle multiagent overestimation from two aspects. First, we extend the random ensemble technique into the estimation of target individual and global Q-values to derive a lower update target. Second, we propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation. Extensive experiments in MPE and SMAC show that the proposed method successfully addresses overestimation across various tasks.
Problem

Research questions and friction points this paper is trying to address.

Multiagent overestimation in Q-learning
Iterative estimation-optimization analysis
Hypernet regularizer for optimization control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Random ensemble technique
Hypernet regularizer
Iterative estimation-optimization framework
🔎 Similar Papers
No similar papers found.