FlowSynth: Instrument Generation Through Distributional Flow Matching and Test-Time Search

πŸ“… 2025-10-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In virtual instrument synthesis, note-level models struggle to maintain timbral consistency across varying pitches and velocities. To address this, we propose the Distributional Flow Matching (DFM) framework, which models the velocity field as a Gaussian distribution with predictive uncertainty and incorporates a music-perceptually motivated consistency regularization term. At inference, a confidence-weighted multi-trajectory search strategy is employed, significantly enhancing timbral coherence. The method optimizes a negative log-likelihood objective, jointly balancing generation fidelity and cross-note timbral consistency. Experimental results demonstrate that DFM outperforms the state-of-the-art TokenSynth in both monophonic fidelity and inter-note timbral consistency, while enabling real-time, professional-gradeζΌ”ε₯ synthesis.

Technology Category

Application Category

πŸ“ Abstract
Virtual instrument generation requires maintaining consistent timbre across different pitches and velocities, a challenge that existing note-level models struggle to address. We present FlowSynth, which combines distributional flow matching (DFM) with test-time optimization for high-quality instrument synthesis. Unlike standard flow matching that learns deterministic mappings, DFM parameterizes the velocity field as a Gaussian distribution and optimizes via negative log-likelihood, enabling the model to express uncertainty in its predictions. This probabilistic formulation allows principled test-time search: we sample multiple trajectories weighted by model confidence and select outputs that maximize timbre consistency. FlowSynth outperforms the current state-of-the-art TokenSynth baseline in both single-note quality and cross-note consistency. Our approach demonstrates that modeling predictive uncertainty in flow matching, combined with music-specific consistency objectives, provides an effective path to professional-quality virtual instruments suitable for real-time performance.
Problem

Research questions and friction points this paper is trying to address.

Generating virtual instruments with consistent timbre
Modeling predictive uncertainty in flow matching
Improving cross-note consistency for real-time performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributional flow matching models predictive uncertainty
Test-time search maximizes timbre consistency
Probabilistic formulation enables principled trajectory sampling
πŸ”Ž Similar Papers
No similar papers found.
Q
Qihui Yang
University of California, San Diego
R
Randal Leistikow
Smule Labs
Yongyi Zang
Yongyi Zang
Smule, Inc.
Computer AuditionSpeech ProcessingMusic Information RetrievalMusic Composition