🤖 AI Summary
This work addresses the low communication efficiency and poor robustness of token transmission in point cloud Transformers. We propose the first end-to-end joint semantic-channel coding and modulation framework tailored for point cloud semantic tokens. Methodologically, we design a dual-branch Point Transformer encoder to extract structured point tokens, integrate a differentiable modulator, employ Gumbel-softmax reparameterization and soft quantization for semantic-aware symbol generation, and incorporate rate allocation with channel adaptation. Key innovations include semantic-driven modulation symbol generation, an end-to-end trainable joint optimization architecture, and explicit modeling of point cloud geometric–semantic characteristics. Experiments demonstrate that, at identical bitrates, our method achieves over 1 dB PSNR gain in reconstruction over conventional separated schemes and state-of-the-art joint approaches; modulation symbols achieve a 6.2× compression ratio, significantly enhancing semantic fidelity and spectral efficiency for wireless point cloud transmission.
📝 Abstract
In recent years, the Transformer architecture has achieved outstanding performance across a wide range of tasks and modalities. Token is the unified input and output representation in Transformer-based models, which has become a fundamental information unit. In this work, we consider the problem of token communication, studying how to transmit tokens efficiently and reliably. Point cloud, a prevailing three-dimensional format which exhibits a more complex spatial structure compared to image or video, is chosen to be the information source. We utilize the set abstraction method to obtain point tokens. Subsequently, to get a more informative and transmission-friendly representation based on tokens, we propose a joint semantic-channel and modulation (JSCCM) scheme for the token encoder, mapping point tokens to standard digital constellation points (modulated tokens). Specifically, the JSCCM consists of two parallel Point Transformer-based encoders and a differential modulator which combines the Gumel-softmax and soft quantization methods. Besides, the rate allocator and channel adapter are developed, facilitating adaptive generation of high-quality modulated tokens conditioned on both semantic information and channel conditions. Extensive simulations demonstrate that the proposed method outperforms both joint semantic-channel coding and traditional separate coding, achieving over 1dB gain in reconstruction and more than 6x compression ratio in modulated symbols.