Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach

πŸ“… 2025-07-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address low token representation efficiency, challenges in multimodal fusion, and variance collapse induced by autoregressive modeling in semantic communication for large language models, this paper proposes UniToComβ€”a unified token-based framework for processing and wireless transmission. UniToCom innovatively integrates the Generative Information Bottleneck (GenIB) principle and its Οƒ-GenIB regularization into a causal Transformer-based multimodal large language model (MLLM), enabling joint modeling and end-to-end optimization of discrete and continuous tokens. This design effectively mitigates variance collapse while preserving representational diversity. Notably, UniToCom achieves, for the first time, deep integration of tokenized perception and MLLMs within communication systems. Experimental results under dynamic channel conditions demonstrate significant improvements in communication efficiency and multimodal reconstruction fidelity, validating UniToCom as a scalable, advanced architecture for intelligent semantic communication.

Technology Category

Application Category

πŸ“ Abstract
This letter proposes UniToCom, a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission. Specifically, to enable efficient token representations, we propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information while supporting reliable generation across multiple modalities. By doing this, GenIB-based tokenization is conducive to improving the communication efficiency and reducing computational complexity. Additionally, we develop $Οƒ$-GenIB to address the challenges of variance collapse in autoregressive modeling, maintaining representational diversity and stability. Moreover, we employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens under the next-token prediction paradigm. Simulation results validate the effectiveness and superiority of the proposed UniToCom compared to baselines under dynamic channel conditions. By integrating token processing with MLLMs, UniToCom enables scalable and generalizable communication in favor of multimodal understanding and generation, providing a potential solution for next-generation intelligent communications.
Problem

Research questions and friction points this paper is trying to address.

Efficient token representation for multimodal communication
Preventing variance collapse in autoregressive token modeling
Unifying discrete and continuous token processing in MLLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified token communication paradigm for processing and transmission
Generative information bottleneck for efficient token representation
Causal Transformer-based MLLM for unified token processing
πŸ”Ž Similar Papers
No similar papers found.
H
Hao Wei
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
Wanli Ni
Wanli Ni
Tsinghua Univerisity
wireless communicationmachine learning
W
Wen Wang
Pervasive Communications Center, Purple Mountain Laboratories, Nanjing 211111, China
Wenjun Xu
Wenjun Xu
Peng Cheng Laboratory
machine learningreinforcement learningflexible/soft robot
D
Dusit Niyato
College of Computing and Data Science, Nanyang Technological University, Singapore 117583
P
Ping Zhang
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China; Department of Mathematics and Theories, Peng Cheng Laboratory, Shenzhen 518066, China