FlowSynth: Instrument Generation Through Distributional Flow Matching and Test-Time Search

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

In virtual instrument synthesis, note-level models struggle to maintain timbral consistency across varying pitches and velocities. To address this, we propose the Distributional Flow Matching (DFM) framework, which models the velocity field as a Gaussian distribution with predictive uncertainty and incorporates a music-perceptually motivated consistency regularization term. At inference, a confidence-weighted multi-trajectory search strategy is employed, significantly enhancing timbral coherence. The method optimizes a negative log-likelihood objective, jointly balancing generation fidelity and cross-note timbral consistency. Experimental results demonstrate that DFM outperforms the state-of-the-art TokenSynth in both monophonic fidelity and inter-note timbral consistency, while enabling real-time, professional-grade演奏 synthesis.

Technology Category

Application Category

📝 Abstract

Virtual instrument generation requires maintaining consistent timbre across different pitches and velocities, a challenge that existing note-level models struggle to address. We present FlowSynth, which combines distributional flow matching (DFM) with test-time optimization for high-quality instrument synthesis. Unlike standard flow matching that learns deterministic mappings, DFM parameterizes the velocity field as a Gaussian distribution and optimizes via negative log-likelihood, enabling the model to express uncertainty in its predictions. This probabilistic formulation allows principled test-time search: we sample multiple trajectories weighted by model confidence and select outputs that maximize timbre consistency. FlowSynth outperforms the current state-of-the-art TokenSynth baseline in both single-note quality and cross-note consistency. Our approach demonstrates that modeling predictive uncertainty in flow matching, combined with music-specific consistency objectives, provides an effective path to professional-quality virtual instruments suitable for real-time performance.

Problem

Research questions and friction points this paper is trying to address.

Generating virtual instruments with consistent timbre

Modeling predictive uncertainty in flow matching

Improving cross-note consistency for real-time performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributional flow matching models predictive uncertainty

Test-time search maximizes timbre consistency

Probabilistic formulation enables principled trajectory sampling

🔎 Similar Papers

Measuring audio prompt adherence with distribution-based embedding distances