A2SB: Audio-to-Audio Schrodinger Bridges

📅 2025-01-20

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address high-frequency attenuation, bandwidth limitation, and segment corruption in 44.1 kHz high-fidelity music—leading to perceptual degradation—this paper proposes the first end-to-end waveform audio restoration model grounded in Schrödinger bridge theory. The method integrates continuous-time diffusion modeling with full-waveform supervised training to directly synthesize high-quality waveforms, eliminating the need for vocoders. It supports both bandwidth extension and arbitrary-length gap inpainting, scales to one-hour audio sequences, and enables near-real-time inference. Crucially, it pioneers the application of Schrödinger bridges to audio-to-audio modeling, overcoming fundamental bottlenecks in long-horizon, high-sample-rate signal representation. Evaluated on multiple out-of-distribution music benchmarks, our approach achieves state-of-the-art performance in both bandwidth extension and restoration tasks.

Technology Category

Application Category

📝 Abstract

Audio in the real world may be perturbed due to numerous factors, causing the audio quality to be degraded. The following work presents an audio restoration model tailored for high-res music at 44.1kHz. Our model, Audio-to-Audio Schrodinger Bridges (A2SB), is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). Critically, A2SB is end-to-end without need of a vocoder to predict waveform outputs, able to restore hour-long audio inputs, and trained on permissively licensed music data. A2SB is capable of achieving state-of-the-art bandwidth extension and inpainting quality on several out-of-distribution music test sets. Our demo website is https: //research.nvidia.com/labs/adlr/A2SB/.

Problem

Research questions and friction points this paper is trying to address.

Audio Quality Degradation

High-Frequency Deficiency

Music Fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio-to-Audio Schrodinger Bridges

high-fidelity music enhancement

self-contained operation

🔎 Similar Papers

No similar papers found.

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

2026 University Graduate - Research Scientist/Engineer

Adobe

San Francisco, California, United States of America

AI Research Scientist - Meta Superintelligence Labs (PhD)