Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inaccurate pitch-timbre disentanglement caused by the source-filter model and the lack of paired in-tune/out-of-tune data in neural pitch transformation, this paper proposes an unpaired pitch conversion framework based on adversarial representation learning. Methodologically, we introduce a novel pitch-invariant latent space modeling mechanism, integrated with cycle-consistent GANs to enable unpaired pitch mapping learning. Furthermore, we unify self-supervised representation learning with a neural vocoder to construct an end-to-end generative architecture. Experiments demonstrate that our method significantly improves synthesized audio quality (MOS ↑0.8) on both global key transposition and template-driven pitch conversion tasks, while strictly preserving the original singer’s timbral identity. It achieves state-of-the-art performance in pitch accuracy, naturalness, and timbre fidelity—outperforming prior approaches across all three metrics.

Technology Category

Application Category

📝 Abstract
Pitch manipulation is the process of producers adjusting the pitch of an audio segment to a specific key and intonation, which is essential in music production. Neural-network-based pitch-manipulation systems have been popular in recent years due to their superior synthesis quality compared to classical DSP methods. However, their performance is still limited due to their inaccurate feature disentanglement using source-filter models and the lack of paired in- and out-of-tune training data. This work proposes Neurodyne to address these issues. Specifically, Neurodyne uses adversarial representation learning to learn a pitch-independent latent representation to avoid inaccurate disentanglement and cycle-consistency training to create paired training data implicitly. Experimental results on global-key and template-based pitch manipulation demonstrate the effectiveness of the proposed system, marking improved synthesis quality while maintaining the original singer identity.
Problem

Research questions and friction points this paper is trying to address.

Improving neural pitch manipulation accuracy via representation learning
Overcoming lack of paired in-out-of-tune training data
Enhancing synthesis quality while preserving singer identity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial representation learning for pitch-independent features
Cycle-consistency GAN for implicit paired data generation
Improved synthesis quality preserving singer identity
🔎 Similar Papers
No similar papers found.