Large Speech Model Enabled Semantic Communication

๐Ÿ“… 2025-12-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

216K/year
๐Ÿค– AI Summary
Existing semantic speech communication systems are constrained by task-specific model architectures, struggling to simultaneously achieve high compression efficiency, acceptable speech quality, and low latency under low-bandwidth, high-packet-loss channel conditions. This paper proposes a large speech model (Moshi)-based semantic communication framework. First, the Mimi codec is employed to generate discrete speech tokens. Second, a content- and channel-aware adaptive controller dynamically adjusts transmission bit rate and redundancy. Third, an in-band unequal error protection mechanism is introduced, coupled with LoRA fine-tuning to enable generative recovery of lost tokens. The system supports variable bit rates from 550 bps to 2.06 kbps, significantly outperforms conventional methods in speech quality under high packet loss, and achieves an end-to-end latency of approximately 460 msโ€”demonstrating feasibility for real-time deployment.

Technology Category

Application Category

๐Ÿ“ Abstract
Existing speech semantic communication systems mainly based on Joint Source-Channel Coding (JSCC) architectures have demonstrated impressive performance, but their effectiveness remains limited by model structures specifically designed for particular tasks and datasets. Recent advances indicate that generative large models pre-trained on massive datasets, can achieve outstanding performance arexhibit exceptional performance across diverse downstream tasks with minimal fine-tuning. To exploit the rich semantic knowledge embedded in large models and enable adaptive transmission over lossy channels, we propose a Large Speech Model enabled Semantic Communication (LargeSC) system. Simultaneously achieving adaptive compression and robust transmission over lossy channels remains challenging, requiring trade-offs among compression efficiency, speech quality, and latency. In this work, we employ the Mimi as a speech codec, converting speech into discrete tokens compatible with existing network architectures. We propose an adaptive controller module that enables adaptive transmission and in-band Unequal Error Protection (UEP), dynamically adjusting to both speech content and packet loss probability under bandwidth constraints. Additionally, we employ Low-Rank Adaptation (LoRA) to finetune the Moshi foundation model for generative recovery of lost speech tokens. Simulation results show that the proposed system supports bandwidths ranging from 550 bps to 2.06 kbps, outperforms conventional baselines in speech quality under high packet loss rates and achieves an end-to-end latency of approximately 460 ms, thereby demonstrating its potential for real-time deployment.
Problem

Research questions and friction points this paper is trying to address.

Enables adaptive semantic speech transmission over lossy channels
Achieves robust compression and recovery under bandwidth and packet loss constraints
Reduces latency for real-time deployment while maintaining speech quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Mimi speech codec for discrete token conversion
Employs adaptive controller with Unequal Error Protection
Applies Low-Rank Adaptation to fine-tune Moshi model
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yun Tian
School of Electronics, Peking University, Beijing 100091, China
Z
Zhijin Qin
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China, and with the State Key Laboratory of Space Network and Communications, Beijing, 100084, China
G
Guocheng Lv
School of Electronics, Peking University, Beijing 100091, China
Ye Jin
Ye Jin
Chongqing University of Technology
Physics ScienceMaterals Sciencerare earthluminescenceLED
Kaibin Huang
Kaibin Huang
Professor and Dept.Head, University of Hong Kong; NAI Fellow; IEEE Fellow; Highly Cited Researcher
Machine LearningMobile Edge ComputingWireless CommunicationsWireless Power Transfer
Z
Zhu Han
Department of Electrical and Computer Engineering at the University of Houston, Houston, TX 77004 USA, and also with the Department of Computer Science and Engineering, Kyung Hee University, Seoul, South Korea, 446-701