Double Distillation Network for Multi-Agent Reinforcement Learning

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the policy gap and cumulative error arising from partial observability in centralized training with decentralized execution (CTDE), this paper proposes the Dual Distillation Network (DDN). Externally, DDN employs global-local knowledge distillation to align training objectives with decentralized execution policies, mitigating the inherent inconsistency of CTDE. Internally, it introduces a state-driven intrinsic reward distillation mechanism to enhance robust cooperative exploration under information constraints. DDN is the first framework to integrate nested distillation—combining knowledge distillation, CTDE, intrinsic reward modeling, and soft policy constraint alignment—within a multi-agent reinforcement learning architecture. Evaluated on multiple standard multi-agent benchmarks, DDN achieves significant improvements in collaborative performance and policy convergence speed, while yielding final policies with superior robustness compared to state-of-the-art CTDE methods.

Technology Category

Application Category

📝 Abstract
Multi-agent reinforcement learning typically employs a centralized training-decentralized execution (CTDE) framework to alleviate the non-stationarity in environment. However, the partial observability during execution may lead to cumulative gap errors gathered by agents, impairing the training of effective collaborative policies. To overcome this challenge, we introduce the Double Distillation Network (DDN), which incorporates two distillation modules aimed at enhancing robust coordination and facilitating the collaboration process under constrained information. The external distillation module uses a global guiding network and a local policy network, employing distillation to reconcile the gap between global training and local execution. In addition, the internal distillation module introduces intrinsic rewards, drawn from state information, to enhance the exploration capabilities of agents. Extensive experiments demonstrate that DDN significantly improves performance across multiple scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addresses non-stationarity in multi-agent environments
Reduces cumulative gap errors from partial observability
Enhances agent collaboration under information constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Double Distillation Network
Global and local distillation
Intrinsic rewards enhancement