🤖 AI Summary
In digital semantic communication, conventional joint source-channel coding (JSCC) struggles with incompatibility between continuous semantic representations and discrete variable-length codewords, hindering fine-grained bit-level rate control. Method: This paper proposes an end-to-end trainable variable-length semantic encoding framework grounded in the information bottleneck principle. It decouples code length from semantic content via architectural design, enables direct semantic compression under finite codebooks, and employs policy gradient optimization to handle non-differentiable code-length decisions. Contribution/Results: Experiments demonstrate that the method significantly improves downstream task inference accuracy at low bitrates, outperforming existing baselines adapted for digital semantic communication systems. It is the first to achieve precise, semantic-aware bitrate regulation while maintaining high transmission efficiency—unifying semantic fidelity and bit-level controllability.
📝 Abstract
This paper investigates a key challenge faced by joint source-channel coding (JSCC) in digital semantic communication (SemCom): the incompatibility between existing JSCC schemes that yield continuous encoded representations and digital systems that employ discrete variable-length codewords. It further results in feasibility issues in achieving physical bit-level rate control via such JSCC approaches for efficient semantic transmission. In this paper, we propose a novel end-to-end coding (E2EC) framework to tackle it. The semantic coding problem is formed by extending the information bottleneck (IB) theory over noisy channels, which is a tradeoff between bit-level communication rate and semantic distortion. With a structural decomposition of encoding to handle code length and content respectively, we can construct an end-to-end trainable encoder that supports the direct compression of a data source into a finite codebook. To optimize our E2EC across non-differentiable operations, e.g., sampling, we use the powerful policy gradient to support gradient-based updates. Experimental results illustrate that E2EC achieves high inference quality with low bit rates, outperforming representative baselines compatible with digital SemCom systems.