Bayesian Ego-graph inference for Networked Multi-Agent Reinforcement Learning

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited adaptability in networked multi-agent reinforcement learning (Networked-MARL) caused by partial observability and static communication graphs, this paper proposes a decentralized dynamic graph learning framework. Methodologically, each agent end-to-end learns a sparse ego-graph via Bayesian variational inference, jointly optimizing both communication masks and policies to enable context-aware, interpretable, and low-overhead dynamic interaction. The approach trains agents in a fully distributed manner—without global state information or centralized training—by maximizing an evidence lower bound (ELBO) objective that governs local subgraph sampling and message passing. Empirically evaluated on a large-scale traffic control task involving 167 agents, the method substantially outperforms mainstream MARL baselines, achieving simultaneous improvements in task performance, scalability, and communication efficiency.

Technology Category

Application Category

📝 Abstract
In networked multi-agent reinforcement learning (Networked-MARL), decentralized agents must act under local observability and constrained communication over fixed physical graphs. Existing methods often assume static neighborhoods, limiting adaptability to dynamic or heterogeneous environments. While centralized frameworks can learn dynamic graphs, their reliance on global state access and centralized infrastructure is impractical in real-world decentralized systems. We propose a stochastic graph-based policy for Networked-MARL, where each agent conditions its decision on a sampled subgraph over its local physical neighborhood. Building on this formulation, we introduce BayesG, a decentralized actor-framework that learns sparse, context-aware interaction structures via Bayesian variational inference. Each agent operates over an ego-graph and samples a latent communication mask to guide message passing and policy computation. The variational distribution is trained end-to-end alongside the policy using an evidence lower bound (ELBO) objective, enabling agents to jointly learn both interaction topology and decision-making strategies. BayesG outperforms strong MARL baselines on large-scale traffic control tasks with up to 167 agents, demonstrating superior scalability, efficiency, and performance.
Problem

Research questions and friction points this paper is trying to address.

Decentralized agents must act under local observability with constrained communication
Existing methods assume static neighborhoods, limiting adaptability to dynamic environments
Centralized frameworks require global state access, impractical in real-world systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian variational inference for ego-graphs
Sampling latent communication masks for agents
Joint learning of topology and policy via ELBO
🔎 Similar Papers
No similar papers found.
W
Wei Duan
Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia
J
Jie Lu
Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia
Junyu Xuan
Junyu Xuan
AAII, University of Technology Sydney
Machine LearningBayesian Nonparametric LearningInformation NetworkWeb MiningText Mining