FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

📅 2024-10-21
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the zero-shot out-of-distribution generalization challenge in multi-agent reinforcement learning (MARL) under dynamic entity counts during inference—e.g., agent addition/removal or obstacle appearance/disappearance. We propose “Flicker Fusion”, a lightweight stochastic observation masking mechanism that simulates varying entity compositions via observation-space dropout, without altering the underlying MARL algorithm architecture or training objective. To our knowledge, this is the first method achieving *simultaneous* improvement in inference reward and reduction in policy uncertainty. Evaluated across multiple dynamic-entity benchmarks, Flicker Fusion yields an average 23.6% gain in inference reward and a 37.1% reduction in policy uncertainty. The implementation—including code, pretrained models, and interactive visualizations—is publicly released.

Technology Category

Application Category

📝 Abstract
Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory -- a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.
Problem

Research questions and friction points this paper is trying to address.

Addresses dynamic entity changes in multi-agent RL
Improves zero-shot out-of-domain generalization performance
Reduces uncertainty in dynamic real-world scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic dropout of observation space
Universal augmentation for MARL methods
Reduces uncertainty in dynamic environments
🔎 Similar Papers
No similar papers found.