๐ค AI Summary
Conventional channel protection mechanisms struggle with packet loss under bandwidth- and latency-constrained wireless networks, leading to insufficient robustness for real-time speech. To address this, we propose Glarisโa semantic communication framework compatible with existing digital communication systems that enhances error resilience. Its core innovation lies in a generative latent-prior-guided latent-space speech encoder, jointly optimizing semantic fidelity and reconstruction quality within the generative modelโs latent space. Additionally, Glaris incorporates a lightweight forward error correction scheme coupled with a latent-prior-driven packet-loss concealment mechanism to suppress error propagation and enable semantic-level fault tolerance. Experiments on LibriSpeech demonstrate that Glaris achieves joint source-channel coding (JSCC)-level robustness with significantly lower redundancy overhead, striking a superior trade-off between transmission efficiency and speech quality.
๐ Abstract
Real-time speech communication over wireless networks remains challenging, as conventional channel protection mechanisms cannot effectively counter packet loss under stringent bandwidth and latency constraints. Semantic communication has emerged as a promising paradigm for enhancing the robustness of speech transmission by means of joint source-channel coding (JSCC). However, its cross-layer design hinders practical deployment due to the incompatibility with existing digital communication systems. In this case, the robustness of speech communication is consequently evaluated primarily by the error-resilience to packet loss over wireless networks. To address these challenges, we propose emph{Glaris}, a generative latent-prior-based resilient speech semantic communication framework that performs resilient speech coding in the generative latent space. Generative latent priors enable high-quality packet loss concealment (PLC) at the receiver side, well-balancing semantic consistency and reconstruction fidelity. Additionally, an integrated error resilience mechanism is designed to mitigate the error propagation and improve the effectiveness of PLC. Compared with traditional packet-level forward error correction (FEC) strategies, our new method achieves enhanced robustness over dynamic wireless networks while reducing redundancy overhead significantly. Experimental results on the LibriSpeech dataset demonstrate that emph{Glaris} consistently outperforms existing error-resilient codecs, achieving JSCC-level robustness while maintaining seamless compatibility with existing systems, and it also strikes a favorable balance between transmission efficiency and speech reconstruction quality.