🤖 AI Summary
This work addresses the challenge of policy homogenization in multi-agent reinforcement learning caused by naive parameter sharing. To mitigate this issue, the paper introduces— for the first time—a spectral-domain representation into the parameter-sharing architecture. Specifically, the shared network is transformed via singular value decomposition (SVD), where all agents share the singular vector directions while independently learning their own spectral masks over the singular values. This design preserves model scalability and effectively promotes policy diversity. Empirical evaluations demonstrate that the proposed method achieves performance comparable to state-of-the-art approaches across multiple benchmarks—including Level-Based Foraging (LBF), SMACv2, and MaMuJoCo—while significantly improving computational and memory efficiency.
📝 Abstract
Parameter sharing is a key strategy in multi-agent reinforcement learning (MARL) for improving scalability, yet conventional fully shared architectures often collapse into homogeneous behaviors. Recent methods introduce diversity through clustering, pruning, or masking, but typically compromise resource efficiency. We propose Prism, a parameter sharing framework that induces inter-agent diversity by representing shared networks in the spectral domain via singular value decomposition (SVD). All agents share the singular vector directions while learning distinct spectral masks on singular values. This mechanism encourages inter-agent diversity and preserves scalability. Extensive experiments on both homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) benchmarks show that Prism achieves competitive performance with superior resource efficiency.