🤖 AI Summary
To address severe semantic distortion and reconstruction degradation in semantic video communication under low-bandwidth and dynamic channel conditions, this paper proposes a World Foundation Model (WFM)-based semantic video transmission framework. The framework jointly leverages inter-frame semantic prediction and text-guided future frame generation, incorporating a lightweight deep feedback mechanism for demand-driven transmission. It further introduces segmentation-assisted partial frame restoration and camera-trajectory-driven proactive scheduling—marking the first integration of prediction reliability with real-time channel state optimization. Experimental results demonstrate that the proposed method significantly reduces transmission overhead by an average of 62% across diverse channel conditions, while preserving task-level semantic fidelity, effectively mitigating error accumulation, and enhancing both robustness and efficiency of semantic communication.
📝 Abstract
Semantic communication is a promising technique for emerging wireless applications, which reduces transmission overhead by transmitting only task-relevant features instead of raw data. However, existing methods struggle under extremely low bandwidth and varying channel conditions, where corrupted or missing semantics lead to severe reconstruction errors. To resolve this difficulty, we propose a world foundation model (WFM)-aided semantic video transmission framework that leverages the predictive capability of WFMs to generate future frames based on the current frame and textual guidance. This design allows transmissions to be omitted when predictions remain reliable, thereby saving bandwidth. Through WFM's prediction, the key semantics are preserved, yet minor prediction errors tend to amplify over time. To mitigate issue, a lightweight depth-based feedback module is introduced to determine whether transmission of the current frame is needed. Apart from transmitting the entire frame, a segmentation-assisted partial transmission method is proposed to repair degraded frames, which can further balance performance and bandwidth cost. Furthermore, an active transmission strategy is developed for mobile scenarios by exploiting camera trajectory information and proactively scheduling transmissions before channel quality deteriorates. Simulation results show that the proposed framework significantly reduces transmission overhead while maintaining task performances across varying scenarios and channel conditions.