SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling

📅 2025-01-22

🏛️ IEEE Journal on Selected Areas in Communications

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address severe audio quality degradation under high packet-loss rates (up to 40%), this paper proposes SoundSpring—the first semantic audio transceiver system that unifies causal masked language modeling (Causal MLM) for both audio compression and real-time packet-loss concealment. SoundSpring performs joint source-channel optimization in a neural acoustic feature latent space, leveraging causal sequential masking, neural quantization, and digital packetized transmission—departing from conventional end-to-end analog mapping paradigms. Experimental results demonstrate that SoundSpring consistently outperforms state-of-the-art methods across major objective metrics, including PESQ, STOI, and ViSQOL, yielding substantial improvements in perceptual speech quality and robustness. By integrating semantic representation learning with channel-aware transmission design, SoundSpring establishes a novel paradigm for semantic communication over unreliable channels.

Technology Category

Application Category

📝 Abstract

In this paper, we propose"SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers.

Problem

Research questions and friction points this paper is trying to address.

Audio Transmission System

Signal Degradation

Packet Loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint Source-Channel Coding

Large Language Models

Audio Semantic Transceivers

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

Audio Inference Engineer, Model Efficiency

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

Authors to Follow