Compose Yourself: Average-Velocity Flow Matching for One-Step Speech Enhancement

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Diffusion and flow-matching models for speech enhancement suffer from multi-step sampling, high computational overhead, and sensitivity to discretization error. To address these issues, this paper proposes COSE, a single-step generative framework. Its core innovation lies in reconstructing the dynamical process via an average velocity field, efficiently computed using a velocity composition identity—thereby avoiding costly Jacobian-vector products. Theoretically consistent with continuous-time flow matching and preserving speech enhancement quality, COSE significantly reduces both training and inference complexity. On standard benchmarks, it achieves up to 5× sampling speedup and reduces training cost by 40%, while maintaining high fidelity and perceptual quality.

Technology Category

Application Category

📝 Abstract

Diffusion and flow matching (FM) models have achieved remarkable progress in speech enhancement (SE), yet their dependence on multi-step generation is computationally expensive and vulnerable to discretization errors. Recent advances in one-step generative modeling, particularly MeanFlow, provide a promising alternative by reformulating dynamics through average velocity fields. In this work, we present COSE, a one-step FM framework tailored for SE. To address the high training overhead of Jacobian-vector product (JVP) computations in MeanFlow, we introduce a velocity composition identity to compute average velocity efficiently, eliminating expensive computation while preserving theoretical consistency and achieving competitive enhancement quality. Extensive experiments on standard benchmarks show that COSE delivers up to 5x faster sampling and reduces training cost by 40%, all without compromising speech quality. Code is available at https://github.com/ICDM-UESTC/COSE.

Problem

Research questions and friction points this paper is trying to address.

One-step speech enhancement with reduced computational cost

Efficient average velocity computation avoiding Jacobian-vector products

Maintaining speech quality while accelerating sampling and training

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step flow matching framework

Velocity composition identity computation

Efficient average velocity calculation

🔎 Similar Papers

High-Resolution Speech Restoration with Latent Diffusion Model