π€ AI Summary
To address low token representation efficiency, challenges in multimodal fusion, and variance collapse induced by autoregressive modeling in semantic communication for large language models, this paper proposes UniToComβa unified token-based framework for processing and wireless transmission. UniToCom innovatively integrates the Generative Information Bottleneck (GenIB) principle and its Ο-GenIB regularization into a causal Transformer-based multimodal large language model (MLLM), enabling joint modeling and end-to-end optimization of discrete and continuous tokens. This design effectively mitigates variance collapse while preserving representational diversity. Notably, UniToCom achieves, for the first time, deep integration of tokenized perception and MLLMs within communication systems. Experimental results under dynamic channel conditions demonstrate significant improvements in communication efficiency and multimodal reconstruction fidelity, validating UniToCom as a scalable, advanced architecture for intelligent semantic communication.
π Abstract
This letter proposes UniToCom, a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission. Specifically, to enable efficient token representations, we propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information while supporting reliable generation across multiple modalities. By doing this, GenIB-based tokenization is conducive to improving the communication efficiency and reducing computational complexity. Additionally, we develop $Ο$-GenIB to address the challenges of variance collapse in autoregressive modeling, maintaining representational diversity and stability. Moreover, we employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens under the next-token prediction paradigm. Simulation results validate the effectiveness and superiority of the proposed UniToCom compared to baselines under dynamic channel conditions. By integrating token processing with MLLMs, UniToCom enables scalable and generalizable communication in favor of multimodal understanding and generation, providing a potential solution for next-generation intelligent communications.