LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling

📅 2024-09-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
To address limitations of conventional singing voice conversion (SVC) models—namely insufficient clarity, high data dependency, and excessive computational overhead—this paper proposes a lightweight, CPU-compatible SVC method tailored for high-fidelity, low-resource scenarios. We introduce the first diffusion-based SVC framework integrated into a streamlined architecture, incorporating a speech-quality-oriented inference enhancement mechanism and hardware-aware parallel optimization. Our approach leverages OpenMP acceleration, compact acoustic feature modeling, and an efficient sampling algorithm. The resulting model occupies only 12 MB and achieves fully real-time inference across diverse CPU platforms, delivering 3.2–8.7× speedup over prior methods. Subjective evaluation yields a Mean Opinion Score (MOS) of 4.12—significantly surpassing existing lightweight SVC models—while preserving timbral fidelity, melodic consistency, and deployment efficiency.

Technology Category

Application Category

📝 Abstract
Singing Voice Conversion (SVC) has emerged as a significant subfield of Voice Conversion (VC), enabling the transformation of one singer's voice into another while preserving musical elements such as melody, rhythm, and timbre. Traditional SVC methods have limitations in terms of audio quality, data requirements, and computational complexity. In this paper, we propose LHQ-SVC, a lightweight, CPU-compatible model based on the SVC framework and diffusion model, designed to reduce model size and computational demand without sacrificing performance. We incorporate features to improve inference quality, and optimize for CPU execution by using performance tuning tools and parallel computing frameworks. Our experiments demonstrate that LHQ-SVC maintains competitive performance, with significant improvements in processing speed and efficiency across different devices. The results suggest that LHQ-SVC can meet
Problem

Research questions and friction points this paper is trying to address.

Singing Voice Conversion
Lightweight Model
Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

LHQ-SVC
voice conversion
optimization techniques
🔎 Similar Papers
No similar papers found.
Y
Yubo Huang
School of Civil Engineering, Southwest Jiaotong University, China; Zhenguan AI Lab, ZhenGuan Innovation (Shenzhen) Technology Co. Ltd, China
Xin Lai
Xin Lai
ByteDance
Multimodal UnderstandingMultimodal Agent
M
Muyang Ye
SWJTU-Leeds Joint School, Southwest Jiaotong University, China
A
Anran Zhu
School of Computing and Artificial Intelligence, Southwest Jiaotong University, China
Zixi Wang
Zixi Wang
School of Computing and Artificial Intelligence, Southwest Jiaotong University, China
J
Jingzehua Xu
Shenzhen International Graduate School, Tsinghua University, China
S
Shuai Zhang
Department of Data Science, New Jersey Institute of Technology, USA
Zhiyuan Zhou
Zhiyuan Zhou
PhD student, UC Berkeley
RoboticsReinforcement Learning
W
Weijie Niu
School of Economics and Management, Southwest Jiaotong University, China