NetWorld: Communication-Based Diffusion World Model for Multi-Agent Reinforcement Learning in Wireless Networks

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high interaction cost and poor cross-task generalization of multi-agent reinforcement learning (MARL) in wireless communication networks by proposing NetWorld, a communication-based diffusion world model. NetWorld adopts a distributed training and decentralized execution paradigm, pre-training a conditional diffusion model on offline multitask data and performing trajectory planning entirely within the model to eliminate online interaction. Key innovations include a shared latent space, a dual-hot encoded action-reward representation, an inverse dynamics-based action recovery mechanism, and a lightweight mean-field communication protocol, collectively mitigating non-stationarity and enhancing few-shot cross-task generalization. Experiments demonstrate that NetWorld significantly outperforms existing MARL approaches across three representative wireless networking tasks, exhibiting superior sample efficiency, scalability, and practical deployment potential.

Technology Category

Application Category

📝 Abstract
As wireless communication networks grow in scale and complexity, diverse resource allocation tasks become increasingly critical. Multi-Agent Reinforcement Learning (MARL) provides a promising solution for distributed control, yet it often requires costly real-world interactions and lacks generalization across diverse tasks. Meanwhile, recent advances in Diffusion Models (DMs) have demonstrated strong capabilities in modeling complex dynamics and supporting high-fidelity simulation. Motivated by these challenges and opportunities, we propose a Communication-based Diffusion World Model (NetWorld) to enable few-shot generalization across heterogeneous MARL tasks in wireless networks. To improve applicability to large-scale distributed networks, NetWorld adopts the Distributed Training with Decentralized Execution (DTDE) paradigm and is organized into a two-stage framework: (i) pre-training a classifier-guided conditional diffusion world model on multi-task offline datasets, and (ii) performing trajectory planning entirely within this world model to avoid additional online interaction. Cross-task heterogeneity is handled via shared latent processing for observations, two-hot discretization for task-specific actions and rewards, and an inverse dynamics model for action recovery. We further introduce a lightweight Mean Field (MF) communication mechanism to reduce non-stationarity and promote coordinated behaviors with low overhead. Experiments on three representative tasks demonstrate improved performance and sample efficiency over MARL baselines, indicating strong scalability and practical potential for wireless network optimization.
Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning
Wireless Networks
Resource Allocation
Generalization
Sample Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion World Model
Multi-Agent Reinforcement Learning
Few-shot Generalization
Distributed Training with Decentralized Execution
Mean Field Communication