ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenge of maintaining speech intelligibility in ultra-low-bitrate communication (e.g., 200 bps), where conventional neural codecs struggle due to their reliance on acoustic reconstruction losses. The authors propose a novel approach that models the quantization process as a stochastic policy and fine-tunes the encoder via reinforcement learning, using word error rate (WER) as the reward signal while keeping the acoustic reconstruction module frozen. By directly optimizing bit allocation toward intelligibility rather than perceptual fidelity, the method achieves significant gains: WERs of 3.20% and 8.93% on LibriSpeech test-clean and test-other, respectively—representing a 13% relative reduction over the baseline—while preserving high perceptual quality and matching the performance of systems operating at substantially higher bitrates.

Technology Category

Application Category

📝 Abstract

In bandwidth-constrained communication such as satellite and underwater channels, speech must often be transmitted at ultra-low bitrates where intelligibility is the primary objective. At such extreme compression levels, codecs trained with acoustic reconstruction losses tend to allocate bits to perceptual detail, leading to substantial degradation in word error rate (WER). This paper proposes ClariCodec, a neural speech codec operating at 200 bit per second (bps) that reformulates quantisation as a stochastic policy, enabling reinforcement learning (RL)-based optimisation of intelligibility. Specifically, the encoder is fine-tuned using WER-driven rewards while the acoustic reconstruction pipeline remains frozen. Even without RL, ClariCodec achieves 3.68% WER on the LibriSpeech test-clean set at 200 bps, already competitive with codecs operating at higher bitrates. Further RL fine-tuning reduces WER to 3.20% on test-clean and 8.93% on test-other, corresponding to a 13% relative reduction while preserving perceptual quality.

Problem

Research questions and friction points this paper is trying to address.

ultra-low bitrate speech coding

speech intelligibility

word error rate

bandwidth-constrained communication

neural speech codec

Innovation

Methods, ideas, or system contributions that make the work stand out.

neural speech codec

reinforcement learning

ultra-low bitrate

word error rate

intelligibility optimisation

🔎 Similar Papers

No similar papers found.

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

AI Research Scientist - Voice AI Team, Meta Superintelligence Labs