AnyAccomp: Generalizable Accompaniment Generation via Quantized Melodic Bottleneck

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Current singing accompaniment generation (SAG) methods rely on source-separated vocal inputs, making them prone to overfitting separation artifacts and exhibiting poor generalization on unseparated real-world a cappella or solo-instrument recordings. To address this, we propose the “Quantized Melodic Bottleneck” framework, which decouples accompaniment generation from source-related artifacts. First, we fuse chromagrams with a VQ-VAE to extract discrete, timbre-invariant melodic representations; second, we employ a flow-matching model for conditional accompaniment synthesis. This is the first SAG approach enabling robust accompaniment generation directly from unseparated clean vocals and solo-instrument audio. Experiments demonstrate significant improvements over baselines on standard source-separated vocal benchmarks, while also achieving strong generalization on real-world recordings and solo-instrument datasets. Our method establishes a more practical and generalizable technical foundation for music co-creation.

Technology Category

Application Category

📝 Abstract

Singing Accompaniment Generation (SAG) is the process of generating instrumental music for a given clean vocal input. However, existing SAG techniques use source-separated vocals as input and overfit to separation artifacts. This creates a critical train-test mismatch, leading to failure on clean, real-world vocal inputs. We introduce AnyAccomp, a framework that resolves this by decoupling accompaniment generation from source-dependent artifacts. AnyAccomp first employs a quantized melodic bottleneck, using a chromagram and a VQ-VAE to extract a discrete and timbre-invariant representation of the core melody. A subsequent flow-matching model then generates the accompaniment conditioned on these robust codes. Experiments show AnyAccomp achieves competitive performance on separated-vocal benchmarks while significantly outperforming baselines on generalization test sets of clean studio vocals and, notably, solo instrumental tracks. This demonstrates a qualitative leap in generalization, enabling robust accompaniment for instruments - a task where existing models completely fail - and paving the way for more versatile music co-creation tools. Demo audio and code: https://anyaccomp.github.io

Problem

Research questions and friction points this paper is trying to address.

Generates instrumental music for clean vocal inputs

Resolves train-test mismatch from separation artifacts

Enables robust accompaniment for instruments and vocals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantized melodic bottleneck representation

Flow-matching model for accompaniment generation

Discrete timbre-invariant melody encoding

🔎 Similar Papers

Measuring audio prompt adherence with distribution-based embedding distances