🤖 AI Summary
Existing sound matching methods struggle to recover synthesizer modulation signals from audio in an interpretable and structurally faithful manner, often treating modulation as a black-box or high-dimensional frame-level parameterization—thereby neglecting inherent structural properties such as envelope shapes and LFO trajectories. This work introduces a differentiable modulation discovery framework built upon DDSP: it models control signals using parameterized, structure-aware primitives (e.g., piecewise-linear envelopes and periodic LFOs) and jointly optimizes modulation trajectories and timbral fidelity via differentiable synthesis and backpropagation. The method achieves high-fidelity, semantically meaningful reconstruction of modulation structures on both synthetic and real-world audio, and is compatible with diverse DDSP synthesizer architectures. We release open-source code and a VST plugin for practical music production and analysis. Our core contribution is the first end-to-end differentiable, structured, and interpretable inverse modeling of synthesizer modulation signals.
📝 Abstract
Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators (LFOs), and more parameter automation tools that allow users to modulate the output with ease. However, determining the modulation signals used to create a sound is difficult, and existing sound-matching / parameter estimation systems are often uninterpretable black boxes or predict high-dimensional framewise parameter values without considering the shape, structure, and routing of the underlying modulation curves. We propose a neural sound-matching approach that leverages modulation extraction, constrained control signal parameterizations, and differentiable digital signal processing (DDSP) to discover the modulations present in a sound. We demonstrate the effectiveness of our approach on highly modulated synthetic and real audio samples, its applicability to different DDSP synth architectures, and investigate the trade-off it incurs between interpretability and sound-matching accuracy. We make our code and audio samples available and provide the trained DDSP synths in a VST plugin.