🤖 AI Summary
To address limitations of conventional singing voice conversion (SVC) models—namely insufficient clarity, high data dependency, and excessive computational overhead—this paper proposes a lightweight, CPU-compatible SVC method tailored for high-fidelity, low-resource scenarios. We introduce the first diffusion-based SVC framework integrated into a streamlined architecture, incorporating a speech-quality-oriented inference enhancement mechanism and hardware-aware parallel optimization. Our approach leverages OpenMP acceleration, compact acoustic feature modeling, and an efficient sampling algorithm. The resulting model occupies only 12 MB and achieves fully real-time inference across diverse CPU platforms, delivering 3.2–8.7× speedup over prior methods. Subjective evaluation yields a Mean Opinion Score (MOS) of 4.12—significantly surpassing existing lightweight SVC models—while preserving timbral fidelity, melodic consistency, and deployment efficiency.
📝 Abstract
Singing Voice Conversion (SVC) has emerged as a significant subfield of Voice Conversion (VC), enabling the transformation of one singer's voice into another while preserving musical elements such as melody, rhythm, and timbre. Traditional SVC methods have limitations in terms of audio quality, data requirements, and computational complexity. In this paper, we propose LHQ-SVC, a lightweight, CPU-compatible model based on the SVC framework and diffusion model, designed to reduce model size and computational demand without sacrificing performance. We incorporate features to improve inference quality, and optimize for CPU execution by using performance tuning tools and parallel computing frameworks. Our experiments demonstrate that LHQ-SVC maintains competitive performance, with significant improvements in processing speed and efficiency across different devices. The results suggest that LHQ-SVC can meet