SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high inference latency and low throughput of text-to-audio diffusion models, which typically require dozens of denoising steps. The authors propose a training-free, model-agnostic acceleration framework that leverages a joint semantic and audio-duration-aware caching mechanism to align and reuse previously generated outputs as warm starts, dynamically skipping redundant denoising steps. By integrating dynamic step-skipping gating, quality-aware cache eviction, and refinement strategies, the method achieves 1.8–3.0× latency reduction on real-world audio data with only approximately 1K cache entries, while maintaining or even improving perceptual audio quality.

Technology Category

Application Category

📝 Abstract
Text-to-audio diffusion models produce high-fidelity audio but require tens of function evaluations (NFEs), incurring multi-second latency and limited throughput. We present SoundWeaver, the first training-free, model-agnostic serving system that accelerates text-to-audio diffusion by warm-starting from semantically similar cached audio. SoundWeaver introduces three components: a Reference Selector that retrieves and temporally aligns cached candidates via semantic and duration-aware gating; a Skip Gater that dynamically determines the percentage of NFEs to skip; and a lightweight Cache Manager that maintains cache utility through quality-aware eviction and refinement. On real-world audio traces, SoundWeaver achieves 1.8--3.0$ \times $ latency reduction with a cache of only ${\sim}$1K entries while preserving or improving perceptual quality.
Problem

Research questions and friction points this paper is trying to address.

text-to-audio diffusion
latency
function evaluations
throughput
audio generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models
text-to-audio generation
semantic warm-starting
cache-aware acceleration
training-free serving
🔎 Similar Papers
No similar papers found.
A
Ayush Barik
University of Illinois Urbana-Champaign, USA
S
Sofia Stoica
University of Illinois Urbana-Champaign, USA
N
Nikhil Sarda
Assured Intelligence, USA
A
Arnav Kethana
University of Illinois Urbana-Champaign, USA
A
Abhinav Khanduja
University of Illinois Urbana-Champaign, USA
M
Muchen Xu
University of Illinois Urbana-Champaign, USA
Fan Lai
Fan Lai
University of Illinois Urbana-Champaign
Machine Learning SystemsCloud ComputingMachine Learning