High-Fidelity Causal Video Diffusion Models for Real-Time Ultra-Low-Bitrate Semantic Communication

📅 2026-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of achieving high-fidelity, causal, and real-time video communication under ultra-low bitrate conditions (<0.0003 bpp), where existing methods struggle to balance these competing requirements. The authors propose a modular diffusion architecture that integrates lossy semantic encoding with extremely low-resolution frames. Causal video generation is enabled through a semantic control mechanism alongside dedicated recovery and temporal adapters. Furthermore, an efficient temporal distillation strategy is introduced to substantially reduce model parameters and training overhead. The proposed method maintains real-time inference while consistently outperforming both conventional and neural baselines in perceptual quality, semantic fidelity, and temporal consistency, establishing a new paradigm for ultra-low-bitrate video communication.

Technology Category

Application Category

📝 Abstract
We introduce a video diffusion model for high-fidelity, causal, and real-time video generation under ultra-low-bitrate semantic communication constraints. Our approach utilizes lossy semantic video coding to transmit the semantic scene structure, complemented by a stream of highly compressed, low-resolution frames that provide sufficient texture information to preserve fidelity. Building on these inputs, we introduce a modular video diffusion model that contains Semantic Control, Restoration Adapter, and Temporal Adapter. We further introduce an efficient temporal distillation procedure that enables extension to real-time and causal synthesis, reducing trainable parameters by 300x and training time by 2x, while adhering to communication constraints. Evaluated across diverse datasets, the framework achieves strong perceptual quality, semantic fidelity, and temporal consistency at ultra-low bitrates (<0.0003 bpp), outperforming classical, neural, and generative baselines in extensive quantitative, qualitative, and subjective evaluations.
Problem

Research questions and friction points this paper is trying to address.

ultra-low-bitrate
semantic communication
causal video generation
real-time video
high-fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

causal video diffusion
ultra-low-bitrate semantic communication
semantic control
temporal distillation
modular diffusion model
🔎 Similar Papers
No similar papers found.
C
Cem Eteke
School of Computation, Information and Technology, Department of Computer Engineering, Munich Institute of Robotics and Machine Intelligence, Chair of Media Technology, Technical University of Munich, 80333 Munich, Germany
B
Batuhan Tosun
School of Computation, Information and Technology, Department of Computer Engineering, Munich Institute of Robotics and Machine Intelligence, Chair of Media Technology, Technical University of Munich, 80333 Munich, Germany
A
Alexander Griessel
School of Computation, Information and Technology, Department of Computer Engineering, Chair of Communication Networks, Technical University of Munich, 80333 Munich, Germany
Wolfgang Kellerer
Wolfgang Kellerer
Professor for Communication Networks at Technical University of Munich
network protocols and architecturesmobile networksnetwork flexibilitynetwork virtualization and SDN
Eckehard Steinbach
Eckehard Steinbach
Professor for Media Technology
Media TechnologyImage and Video CompressionMultimedia CommunicationHaptic CommunicationVisual Localization