Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address low token representation efficiency, challenges in multimodal fusion, and variance collapse induced by autoregressive modeling in semantic communication for large language models, this paper proposes UniToCom—a unified token-based framework for processing and wireless transmission. UniToCom innovatively integrates the Generative Information Bottleneck (GenIB) principle and its σ-GenIB regularization into a causal Transformer-based multimodal large language model (MLLM), enabling joint modeling and end-to-end optimization of discrete and continuous tokens. This design effectively mitigates variance collapse while preserving representational diversity. Notably, UniToCom achieves, for the first time, deep integration of tokenized perception and MLLMs within communication systems. Experimental results under dynamic channel conditions demonstrate significant improvements in communication efficiency and multimodal reconstruction fidelity, validating UniToCom as a scalable, advanced architecture for intelligent semantic communication.

Technology Category

Application Category

📝 Abstract

This letter proposes UniToCom, a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission. Specifically, to enable efficient token representations, we propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information while supporting reliable generation across multiple modalities. By doing this, GenIB-based tokenization is conducive to improving the communication efficiency and reducing computational complexity. Additionally, we develop $σ$-GenIB to address the challenges of variance collapse in autoregressive modeling, maintaining representational diversity and stability. Moreover, we employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens under the next-token prediction paradigm. Simulation results validate the effectiveness and superiority of the proposed UniToCom compared to baselines under dynamic channel conditions. By integrating token processing with MLLMs, UniToCom enables scalable and generalizable communication in favor of multimodal understanding and generation, providing a potential solution for next-generation intelligent communications.

Problem

Research questions and friction points this paper is trying to address.

Efficient token representation for multimodal communication

Preventing variance collapse in autoregressive token modeling

Unifying discrete and continuous token processing in MLLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified token communication paradigm for processing and transmission

Generative information bottleneck for efficient token representation

Causal Transformer-based MLLM for unified token processing

🔎 Similar Papers

No similar papers found.