🤖 AI Summary
To address the lack of causal reasoning, scenario simulation, and uncertainty awareness in O-RAN near-real-time (NRT) control—critical for 6G intelligent evolution—this paper proposes the first control framework integrating counterfactual dynamics with world models. We innovatively model physical resource blocks (PRBs) as first-class control variables within a causal world model, and design WM-MS3M: a world model based on multi-scale structured state spaces and stochastic latent variables. The framework integrates an agent-based planner with a myopic model predictive controller (MPC) optimized via cross-entropy methods. Our approach enables interpretable, low-latency decision-making: on real O-RAN traces, it achieves a 1.69% reduction in MAE, 32% fewer parameters, 2.3–4.1× faster inference, and 35%–80% lower RMSE versus attention-based baselines. Additionally, it supports rare-event simulation and offline policy screening.
📝 Abstract
We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncertainty. We reframe open radio access network (O-RAN) near-real-time (Near-RT) control via counterfactual dynamics and a world modeling (WM) paradigm that learns an action-conditioned generative state space. This enables quantitative"what-if"forecasting beyond large language models (LLMs) as the primary modeling primitive. Actions such as physical resource blocks (PRBs) are treated as first-class control inputs in a causal world model, and both aleatoric and epistemic uncertainty are modeled for prediction and what-if analysis. An agentic, model predictive control (MPC)-based cross-entropy method (CEM) planner operates over short horizons, using prior-mean rollouts within data-driven PRB bounds to maximize a deterministic reward. The model couples multi-scale structured state-space mixtures (MS3M) with a compact stochastic latent to form WM-MS3M, summarizing key performance indicators (KPIs) histories and predicting next-step KPIs under hypothetical PRB sequences. On realistic O-RAN traces, WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference, enabling rare-event simulation and offline policy screening.