π€ AI Summary
In virtual instrument synthesis, note-level models struggle to maintain timbral consistency across varying pitches and velocities. To address this, we propose the Distributional Flow Matching (DFM) framework, which models the velocity field as a Gaussian distribution with predictive uncertainty and incorporates a music-perceptually motivated consistency regularization term. At inference, a confidence-weighted multi-trajectory search strategy is employed, significantly enhancing timbral coherence. The method optimizes a negative log-likelihood objective, jointly balancing generation fidelity and cross-note timbral consistency. Experimental results demonstrate that DFM outperforms the state-of-the-art TokenSynth in both monophonic fidelity and inter-note timbral consistency, while enabling real-time, professional-gradeζΌε₯ synthesis.
π Abstract
Virtual instrument generation requires maintaining consistent timbre across different pitches and velocities, a challenge that existing note-level models struggle to address. We present FlowSynth, which combines distributional flow matching (DFM) with test-time optimization for high-quality instrument synthesis. Unlike standard flow matching that learns deterministic mappings, DFM parameterizes the velocity field as a Gaussian distribution and optimizes via negative log-likelihood, enabling the model to express uncertainty in its predictions. This probabilistic formulation allows principled test-time search: we sample multiple trajectories weighted by model confidence and select outputs that maximize timbre consistency. FlowSynth outperforms the current state-of-the-art TokenSynth baseline in both single-note quality and cross-note consistency. Our approach demonstrates that modeling predictive uncertainty in flow matching, combined with music-specific consistency objectives, provides an effective path to professional-quality virtual instruments suitable for real-time performance.