Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address data scarcity and speaker variability in low-resource dialect identification, this paper introduces retrieval-based voice conversion (RVC) to dialect classification for the first time. By converting multi-speaker utterances into a unified target speaker identity, RVC effectively disentangles speaker characteristics from dialectal linguistic features. Building upon this, we integrate robust augmentation strategies—including frequency masking and segment deletion—to enhance generalization. Experiments demonstrate that RVC alone yields substantial accuracy gains in low-resource settings; when combined with conventional augmentations, it delivers synergistic improvements, achieving state-of-the-art performance. This work establishes a scalable, speaker-invariant paradigm for few-shot dialect modeling, offering a principled solution to speaker interference while preserving dialect-specific phonological and prosodic cues.

Technology Category

Application Category

📝 Abstract
Deep learning models for dialect identification are often limited by the scarcity of dialectal data. To address this challenge, we propose to use Retrieval-based Voice Conversion (RVC) as an effective data augmentation method for a low-resource German dialect classification task. By converting audio samples to a uniform target speaker, RVC minimizes speaker-related variability, enabling models to focus on dialect-specific linguistic and phonetic features. Our experiments demonstrate that RVC enhances classification performance when utilized as a standalone augmentation method. Furthermore, combining RVC with other augmentation methods such as frequency masking and segment removal leads to additional performance gains, highlighting its potential for improving dialect classification in low-resource scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in low-resource dialect classification
Using voice conversion to reduce speaker variability
Enhancing dialect classification with combined augmentation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-based Voice Conversion for data augmentation
Minimizes speaker variability for dialect features
Combines with frequency masking for better performance
🔎 Similar Papers
No similar papers found.