🤖 AI Summary
In communication-constrained multi-agent partially observable decision-making, heterogeneous agent beliefs in Dec-POMDPs lead to coordination failure and degraded task performance. Method: We propose the first decentralized joint action selection framework for Dec-POMDPs, integrating open-loop multi-agent POMDP modeling, quantitative belief divergence measurement, stochastic optimization, and a conditional, on-demand communication mechanism that dynamically determines data sharing during inference. Contribution/Results: Our framework is the first to simultaneously guarantee—under inconsistent beliefs—both joint action consistency and task performance with probabilistic bounds. Experiments demonstrate significant improvements over state-of-the-art methods: higher task success rates, enhanced coordination stability, and over 30% reduction in average communication overhead.
📝 Abstract
Multi-agent decision-making under uncertainty is fundamental for effective and safe autonomous operation. In many real-world scenarios, each agent maintains its own belief over the environment and must plan actions accordingly. However, most existing approaches assume that all agents have identical beliefs at planning time, implying these beliefs are conditioned on the same data. Such an assumption is often impractical due to limited communication. In reality, agents frequently operate with inconsistent beliefs, which can lead to poor coordination and suboptimal, potentially unsafe, performance. In this paper, we address this critical challenge by introducing a novel decentralized framework for optimal joint action selection that explicitly accounts for belief inconsistencies. Our approach provides probabilistic guarantees for both action consistency and performance with respect to open-loop multi-agent POMDP (which assumes all data is always communicated), and selectively triggers communication only when needed. Furthermore, we address another key aspect of whether, given a chosen joint action, the agents should share data to improve expected performance in inference. Simulation results show our approach outperforms state-of-the-art algorithms.