🤖 AI Summary
This work proposes a semantic-level video communication framework that eliminates the transmission of raw visual data, thereby addressing the high bandwidth demands, privacy risks, and system complexity inherent in conventional approaches. The method decouples video into motion dynamics—represented by textual captions—and spatial appearance—encoded as compact semantic representations of keyframes. At the receiver, a generative model reconstructs the video from these semantic cues. By pioneering the use of perceptual hallucination in place of raw data transmission, the framework enables end-to-end personalized constraints and flexible trade-offs. It achieves up to a 50,000× data compression gain while preserving privacy, with performance further improving as video scale increases.
📝 Abstract
The existing communication framework mainly aims at accurate reconstruction of source signals to ensure reliable transmission. However, this signal-level fidelity-oriented design often incurs high communication overhead and system complexity, particularly in video communication scenarios where mainstream frameworks rely on transmitting visual data itself, resulting in significant bandwidth consumption. To address this issue, we propose a visual data-free communication framework, Mirage, for extremely efficient video transmission while preserving semantic information. Mirage decomposes video content into two complementary components: temporal sequence information capturing motion dynamics and spatial appearance representations describing overall visual structure. Temporal information is preserved through video captioning, while key frames are encoded into compact semantic representations for spatial appearance. These representations are transmitted to the receiver, where videos are synthesized using generative video models. Since no raw visual data is transmitted, Mirage is inherently privacy-preserving. Mirage also supports personalized adaptation across deployment scenarios. The sender, network, and receiver can independently impose constraints on semantic representation, transmission, and generation, enabling flexible trade-offs between efficiency, privacy, control, and perceptual quality. Experimental results in video transmission demonstrate that Mirage achieves up to a 50000X data-level compression speedup over raw video transmission, with gains expected to scale with larger video content sizes.