Live Vocal Extraction from K-pop Performances

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This paper formally introduces and defines the “K-pop live vocal separation” task, addressing the challenge of highly overlapping pre-recorded accompaniment and live vocals in K-pop concerts. To tackle this, we propose a three-stage method integrating deep source separation, inter-channel cross-correlation-based time-delay correction, and adaptive amplitude rescaling: (1) coarse vocal-accompaniment separation; (2) phase alignment of vocal components across multi-channel recordings via cross-correlation; and (3) energy-consistency-driven amplitude re-scaling to suppress residual accompaniment. Evaluated on a newly curated K-pop live audio dataset, our approach significantly outperforms baseline models, achieving an average 3.2 dB improvement in Signal-to-Distortion Ratio (SDR). To the best of our knowledge, this is the first method to enable high-fidelity, automated extraction of live vocals under complex real-world mixing conditions—providing a practical technical foundation for fan-generated content creation, vocal performance analysis, and real-time interactive applications.

Technology Category

Application Category

📝 Abstract

K-pop's global success is fueled by its dynamic performances and vibrant fan engagement. Inspired by K-pop fan culture, we propose a methodology for automatically extracting live vocals from performances. We use a combination of source separation, cross-correlation, and amplitude scaling to automatically remove pre-recorded vocals and instrumentals from a live performance. Our preliminary work introduces the task of live vocal separation and provides a foundation for future research in this topic.

Problem

Research questions and friction points this paper is trying to address.

Extracting live vocals from K-pop performances

Automatically removing pre-recorded vocals and instrumentals

Introducing live vocal separation task for research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining source separation and cross-correlation techniques

Automatically removing pre-recorded vocals and instrumentals

Applying amplitude scaling for live vocal extraction

🔎 Similar Papers

No similar papers found.