Transformer-Based Scalable Multi-Agent Reinforcement Learning for Networked Systems with Long-Range Interactions

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-agent reinforcement learning (MARL) methods face two key bottlenecks in large-scale network control: (1) difficulty modeling long-range dependencies—e.g., cascading failures or epidemic spread—and (2) poor generalization across varying network topologies. To address these, we propose STACCA, a novel MARL framework that integrates a shared graph transformer policy network with a centralized graph transformer critic to explicitly capture global state dependencies. Additionally, we design a counterfactual advantage estimation mechanism compatible with state-value functions, enhancing credit assignment accuracy. STACCA achieves topology-agnostic adaptation—requiring no retraining when deployed on unseen network structures—while maintaining scalability and topological invariance. Evaluated on epidemic intervention and rumor suppression tasks, STACCA consistently outperforms strong baselines, demonstrating superior generalization, robustness, and practical deployability in real-world networked systems.

Technology Category

Application Category

📝 Abstract
Multi-agent reinforcement learning (MARL) has shown promise for large-scale network control, yet existing methods face two major limitations. First, they typically rely on assumptions leading to decay properties of local agent interactions, limiting their ability to capture long-range dependencies such as cascading power failures or epidemic outbreaks. Second, most approaches lack generalizability across network topologies, requiring retraining when applied to new graphs. We introduce STACCA (Shared Transformer Actor-Critic with Counterfactual Advantage), a unified transformer-based MARL framework that addresses both challenges. STACCA employs a centralized Graph Transformer Critic to model long-range dependencies and provide system-level feedback, while its shared Graph Transformer Actor learns a generalizable policy capable of adapting across diverse network structures. Further, to improve credit assignment during training, STACCA integrates a novel counterfactual advantage estimator that is compatible with state-value critic estimates. We evaluate STACCA on epidemic containment and rumor-spreading network control tasks, demonstrating improved performance, network generalization, and scalability. These results highlight the potential of transformer-based MARL architectures to achieve scalable and generalizable control in large-scale networked systems.
Problem

Research questions and friction points this paper is trying to address.

Capturing long-range dependencies in multi-agent network control
Achieving policy generalizability across diverse network topologies
Improving credit assignment for scalable reinforcement learning training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based MARL framework for networked systems
Graph Transformer models long-range dependencies and feedback
Counterfactual advantage estimator improves credit assignment
🔎 Similar Papers
No similar papers found.
V
Vidur Sinha
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA
M
Muhammed Ustaomeroglu
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA
Guannan Qu
Guannan Qu
Carnegie Mellon University
Machine LearningGenerative AIReinforcement LearningControl TheoryOptimization