ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing music mastering style transfer methods rely on fixed reference processing, lacking user-controllable, fine-grained adjustment capabilities. This work proposes the first inference-time optimization (ITO)-based reference mastering modeling framework, enabling dynamic optimization of reference embeddings during inference to realize artistic-intent-driven refinement. The method unifies black-box and white-box mastering processors within a single architecture and integrates CLAP-based audio-text joint embeddings to support text-conditioned control. Experiments demonstrate significant improvements in style similarity (+12.7% on objective metrics) and subjective preference scores (+23.4%), alongside strong robustness across diverse mastering styles and support for real-time interactive refinement.

Technology Category

Application Category

📝 Abstract

Music mastering style transfer aims to model and apply the mastering characteristics of a reference track to a target track, simulating the professional mastering process. However, existing methods apply fixed processing based on a reference track, limiting users'ability to fine-tune the results to match their artistic intent. In this paper, we introduce the ITO-Master framework, a reference-based mastering style transfer system that integrates Inference-Time Optimization (ITO) to enable finer user control over the mastering process. By optimizing the reference embedding during inference, our approach allows users to refine the output dynamically, making micro-level adjustments to achieve more precise mastering results. We explore both black-box and white-box methods for modeling mastering processors and demonstrate that ITO improves mastering performance across different styles. Through objective evaluation, subjective listening tests, and qualitative analysis using text-based conditioning with CLAP embeddings, we validate that ITO enhances mastering style similarity while offering increased adaptability. Our framework provides an effective and user-controllable solution for mastering style transfer, allowing users to refine their results beyond the initial style transfer.

Problem

Research questions and friction points this paper is trying to address.

Enables dynamic user control over mastering style transfer

Improves precision in audio effects modeling

Enhances adaptability and style similarity in mastering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-Time Optimization for dynamic adjustments

Black-box and white-box mastering processor modeling

CLAP embeddings for style similarity validation

🔎 Similar Papers

No similar papers found.