Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning

📅 2025-07-12

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the challenge of voice conversion in Lombard speech scenarios where target speakers lack Lombard-style recordings. We propose a speaker identity conversion method that operates without any Lombard utterances from the target speaker. Our core innovation is an implicit acoustic feature conditioning strategy: instead of relying on explicit target-speaker Lombard features, the conversion is guided by universal Lombard acoustic priors—such as elevated F1/F2 formants and increased intensity—which preserve critical Lombard characteristics while maintaining high speaker similarity. Experiments demonstrate that our approach achieves intelligibility gains comparable to explicit conditioning models requiring target Lombard data, and significantly outperforms style-agnostic baselines. To the best of our knowledge, this is the first method to enable high-fidelity, zero-shot Lombard style transfer in speaker conversion.

Technology Category

Application Category

📝 Abstract

Text-to-Speech (TTS) systems in Lombard speaking style can improve the overall intelligibility of speech, useful for hearing loss and noisy conditions. However, training those models requires a large amount of data and the Lombard effect is challenging to record due to speaker and noise variability and tiring recording conditions. Voice conversion (VC) has been shown to be a useful augmentation technique to train TTS systems in the absence of recorded data from the target speaker in the target speaking style. In this paper, we are concerned with Lombard speaking style transfer. Our goal is to convert speaker identity while preserving the acoustic attributes that define the Lombard speaking style. We compare voice conversion models with implicit and explicit acoustic feature conditioning. We observe that our proposed implicit conditioning strategy achieves an intelligibility gain comparable to the model conditioned on explicit acoustic features, while also preserving speaker similarity.

Problem

Research questions and friction points this paper is trying to address.

Transferring Lombard speaking style while preserving speaker identity

Comparing implicit and explicit acoustic feature conditioning methods

Improving intelligibility without extensive Lombard effect recordings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit acoustic feature conditioning for VC

Explicit acoustic feature conditioning comparison

Lombard style transfer with speaker preservation

🔎 Similar Papers

No similar papers found.