A2SB: Audio-to-Audio Schrodinger Bridges

📅 2025-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high-frequency attenuation, bandwidth limitation, and segment corruption in 44.1 kHz high-fidelity music—leading to perceptual degradation—this paper proposes the first end-to-end waveform audio restoration model grounded in Schrödinger bridge theory. The method integrates continuous-time diffusion modeling with full-waveform supervised training to directly synthesize high-quality waveforms, eliminating the need for vocoders. It supports both bandwidth extension and arbitrary-length gap inpainting, scales to one-hour audio sequences, and enables near-real-time inference. Crucially, it pioneers the application of Schrödinger bridges to audio-to-audio modeling, overcoming fundamental bottlenecks in long-horizon, high-sample-rate signal representation. Evaluated on multiple out-of-distribution music benchmarks, our approach achieves state-of-the-art performance in both bandwidth extension and restoration tasks.

Technology Category

Application Category

📝 Abstract
Audio in the real world may be perturbed due to numerous factors, causing the audio quality to be degraded. The following work presents an audio restoration model tailored for high-res music at 44.1kHz. Our model, Audio-to-Audio Schrodinger Bridges (A2SB), is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). Critically, A2SB is end-to-end without need of a vocoder to predict waveform outputs, able to restore hour-long audio inputs, and trained on permissively licensed music data. A2SB is capable of achieving state-of-the-art bandwidth extension and inpainting quality on several out-of-distribution music test sets. Our demo website is https: //research.nvidia.com/labs/adlr/A2SB/.
Problem

Research questions and friction points this paper is trying to address.

Audio Quality Degradation
High-Frequency Deficiency
Music Fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio-to-Audio Schrodinger Bridges
high-fidelity music enhancement
self-contained operation
🔎 Similar Papers
No similar papers found.