🤖 AI Summary
To address the weak generalization of AI models in telecom networks caused by scarce real-world data, this paper proposes a dual-strategy framework to bridge the sim-to-real gap between digital twin simulations and physical deployments. Methodologically: (1) Bayesian learning is introduced at the environment level to enable dynamic calibration of the digital twin; (2) a prediction-augmented inference mechanism is designed at the loss level to explicitly model and suppress simulation biases. Together, these components establish a robust training paradigm that actively perceives and mitigates simulation–reality discrepancies. Experiments demonstrate that the framework significantly improves both deployment performance and cross-scenario generalization of AI models trained solely on synthetic data. It thus provides a transferable, high-fidelity modeling and training pathway for digital-twin-driven communication intelligence.
📝 Abstract
Training effective artificial intelligence models for telecommunications is challenging due to the scarcity of deployment-specific data. Real data collection is expensive, and available datasets often fail to capture the unique operational conditions and contextual variability of the network environment. Digital twinning provides a potential solution to this problem, as simulators tailored to the current network deployment can generate site-specific data to augment the available training datasets. However, there is a need to develop solutions to bridge the inherent simulation-to-reality (sim-to-real) gap between synthetic and real-world data. This paper reviews recent advances on two complementary strategies: 1) the calibration of digital twins (DTs) through real-world measurements, and 2) the use of sim-to-real gap-aware training strategies to robustly handle residual discrepancies between digital twin-generated and real data. For the latter, we evaluate two conceptually distinct methods that model the sim-to-real gap either at the level of the environment via Bayesian learning or at the level of the training loss via prediction-powered inference.