🤖 AI Summary
Existing music mastering style transfer methods rely on fixed reference processing, lacking user-controllable, fine-grained adjustment capabilities. This work proposes the first inference-time optimization (ITO)-based reference mastering modeling framework, enabling dynamic optimization of reference embeddings during inference to realize artistic-intent-driven refinement. The method unifies black-box and white-box mastering processors within a single architecture and integrates CLAP-based audio-text joint embeddings to support text-conditioned control. Experiments demonstrate significant improvements in style similarity (+12.7% on objective metrics) and subjective preference scores (+23.4%), alongside strong robustness across diverse mastering styles and support for real-time interactive refinement.
📝 Abstract
Music mastering style transfer aims to model and apply the mastering characteristics of a reference track to a target track, simulating the professional mastering process. However, existing methods apply fixed processing based on a reference track, limiting users'ability to fine-tune the results to match their artistic intent. In this paper, we introduce the ITO-Master framework, a reference-based mastering style transfer system that integrates Inference-Time Optimization (ITO) to enable finer user control over the mastering process. By optimizing the reference embedding during inference, our approach allows users to refine the output dynamically, making micro-level adjustments to achieve more precise mastering results. We explore both black-box and white-box methods for modeling mastering processors and demonstrate that ITO improves mastering performance across different styles. Through objective evaluation, subjective listening tests, and qualitative analysis using text-based conditioning with CLAP embeddings, we validate that ITO enhances mastering style similarity while offering increased adaptability. Our framework provides an effective and user-controllable solution for mastering style transfer, allowing users to refine their results beyond the initial style transfer.