Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning

📅 2025-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of simultaneously preserving linguistic content and transferring speaker identity in unpaired voice conversion, this paper proposes a multi-task learning-based disentanglement framework. Methodologically, it introduces a dual-domain dataflow architecture—modeling acoustic and linguistic content features in parallel—and a self-destructive constraint mechanism that dynamically suppresses the content encoder’s sensitivity to speaker characteristics, thereby enforcing orthogonality between acoustic and linguistic representations. The framework jointly optimizes reconstruction, adversarial, and self-supervised objectives via deep neural networks. Evaluated on VCTK and LibriSpeech benchmarks, it achieves state-of-the-art performance: a 12.3% reduction in Mel-cepstral distortion (MCD), an 8.7% decrease in word error rate (WER), and a 21% reduction in training cost, significantly improving both speech naturalness and content fidelity.

Technology Category

Application Category

📝 Abstract
Voice conversion (VC) modifies voice characteristics while preserving linguistic content. This paper presents the Stepback network, a novel model for converting speaker identity using non-parallel data. Unlike traditional VC methods that rely on parallel data, our approach leverages deep learning techniques to enhance disentanglement completion and linguistic content preservation. The Stepback network incorporates a dual flow of different domain data inputs and uses constraints with self-destructive amendments to optimize the content encoder. Extensive experiments show that our model significantly improves VC performance, reducing training costs while achieving high-quality voice conversion. The Stepback network's design offers a promising solution for advanced voice conversion tasks.
Problem

Research questions and friction points this paper is trying to address.

Speech Conversion
Naturalness
Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepback Network
Multi-task Learning
Non-matching Data Speech Conversion
🔎 Similar Papers
No similar papers found.