🤖 AI Summary
Traditional wireless video transmission relies on pixel-level coding and neglects semantic redundancy. To address this, this paper proposes WVSC-D, the first semantic communication framework for video transmission incorporating diffusion models. Methodologically: (1) a semantic encoder extracts high-level semantic features; (2) reference semantic frames replace motion vectors to model temporal dependencies; and (3) a decoupled diffusion-based multi-frame compensation (DDMFC) mechanism generates high-fidelity semantic compensation frames via two-stage conditional diffusion. Experiments demonstrate that WVSC-D achieves approximately 1.8 dB PSNR gain over state-of-the-art methods such as DVSC, while significantly reducing bit rate. It thus achieves joint optimization of bandwidth efficiency and semantic fidelity, advancing semantic-aware video communication.
📝 Abstract
Existing wireless video transmission schemes directly conduct video coding in pixel level, while neglecting the inner semantics contained in videos. In this paper, we propose a wireless video semantic communication framework with decoupled diffusion multi-frame compensation (DDMFC), abbreviated as WVSC-D, which integrates the idea of semantic communication into wireless video transmission scenarios. WVSC-D first encodes original video frames as semantic frames and then conducts video coding based on such compact representations, enabling the video coding in semantic level rather than pixel level. Moreover, to further reduce the communication overhead, a reference semantic frame is introduced to substitute motion vectors of each frame in common video coding methods. At the receiver, DDMFC is proposed to generate compensated current semantic frame by a two-stage conditional diffusion process. With both the reference frame transmission and DDMFC frame compensation, the bandwidth efficiency improves with satisfying video transmission performance. Experimental results verify the performance gain of WVSC-D over other DL-based methods e.g. DVSC about 1.8 dB in terms of PSNR.