π€ AI Summary
To address the inefficiency and poor semantic interpretability of video transmission under low-bandwidth, high-noise conditions, this paper proposes a video representation framework structured around object-attribute-relation (OAR) triplets as fundamental semantic units. Unlike end-to-end joint source-channel coding (JSCC) approaches that employ opaque, black-box semantic learning, our method decouples semantic modeling from channel coding: it first extracts and serializes OAR triplets, then designs a semantic-driven JSCC co-optimization mechanism, and finally constructs an OAR-guided generative reconstruction network. Evaluations on a traffic surveillance dataset demonstrate that the proposed method achieves significantly higher reconstruction quality than H.265 at extremely low bitrates, while markedly improving downstream task performance. To the best of our knowledge, this is the first work to realize semantic-level video communication that is interpretable, editable, and task-adaptive.
π Abstract
With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding (JSCC) that depends on end-to-end training. These methods often lack an interpretable semantic representation and struggle with adaptability to various downstream tasks. In this paper, we introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding and enhance the JSCC process for more effective video transmission. We utilize OAR sequences for both low bit-rate representation and generative video reconstruction. Additionally, we incorporate OAR into the image JSCC model to prioritize communication resources for areas more critical to downstream tasks. Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance. The empirical findings demonstrate that our OAR-based video coding method not only outperforms H.265 coding at lower bit-rates but also synergizes with JSCC to deliver robust and efficient video transmission.