🤖 AI Summary
In SWIPT-enabled satellite-terrestrial heterogeneous networks, time-varying channels and multi-layer interference pose significant challenges for distributed beamforming and power-splitting optimization.
Method: This paper proposes the Decentralized World Model with Reasoning Offloading (DWM-RO) framework, which innovatively integrates world modeling with reasoning offloading. It introduces an uncertainty-driven offloading gating mechanism and an edge latent decorrelation module to enable low-overhead agent coordination and orthogonal policy generation, augmented by environment dynamics prediction, imagination-based policy training, coordination-triggered adaptation, and lightweight edge representation optimization.
Results: Experiments demonstrate that DWM-RO achieves 5× faster convergence than state-of-the-art MARL approaches, improves spectral efficiency by 34.7%, and reduces constraint violation rate by 40%. Under dense 10-user scenarios, the violation rate remains below 20%, significantly enhancing robustness and scalability.
📝 Abstract
Wireless networks are undergoing a paradigm shift toward massive connectivity with energy-efficient operation, driving the integration of satellite-terrestrial architectures with simultaneous wireless information and power transfer (SWIPT). Optimizing transmit beamforming and power splitting in such systems faces formidable challenges, e.g., time-varying channels and multi-tier interference, which create a complex decision landscape where conventional model-free multi-agent reinforcement learning (MARL) suffers from sample inefficiency due to rarely-encountered state transitions and poor coordination as decentralized agents act independently. This paper proposes the Decentralized World Model with Reasoning Offloading (DWM-RO) framework to address these fundamental limitations. Specifically, each agent employs a world model to learn compact predictive representations of environment dynamics, enabling imagination-based policy training that dramatically reduces required environment interactions. An uncertainty-aware offloading gate monitors local interference levels and model reconstruction errors to trigger selective edge coordination. When activated, a lightweight latent decorrelation mechanism at the edge refines agents'strategic representations, guiding them toward orthogonal actions that minimize resource conflicts. Extensive simulations demonstrate that DWM-RO converges 5 times faster than state-of-the-art baselines while achieving 34.7% higher spectral efficiency and reducing constraint violations by 40%. In dense network scenarios with 10 users, DWM-RO maintains violation rates below 20% while baselines exceed 70%, validating superior robustness.