Ambisonics Super-Resolution Using A Waveform-Domain Neural Network

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low spatial resolution of first-order Ambisonics (FOA) caused by its limited channel count—posing a fundamental trade-off between efficiency and perceptual audio quality—this paper proposes a waveform-domain deep learning approach for FOA-to-higher-order Ambisonics (HOA) super-resolution. Specifically, we introduce an end-to-end fully convolutional time-domain network, an enhanced variant of Conv-TasNet, that directly maps four-channel FOA waveforms to high-fidelity third-order HOA waveforms, bypassing conventional physics-based or psychoacoustically motivated modeling constraints. Experimental results demonstrate an average reduction of 0.6 dB in source localization error and an 80% improvement in subjective MUSHRA scores, significantly enhancing spatial immersion and auditory clarity while preserving FOA’s low-bandwidth advantage.

Technology Category

Application Category

📝 Abstract
Ambisonics is a spatial audio format describing a sound field. First-order Ambisonics (FOA) is a popular format comprising only four channels. This limited channel count comes at the expense of spatial accuracy. Ideally one would be able to take the efficiency of a FOA format without its limitations. We have devised a data-driven spatial audio solution that retains the efficiency of the FOA format but achieves quality that surpasses conventional renderers. Utilizing a fully convolutional time-domain audio neural network (Conv-TasNet), we created a solution that takes a FOA input and provides a higher order Ambisonics (HOA) output. This data driven approach is novel when compared to typical physics and psychoacoustic based renderers. Quantitative evaluations showed a 0.6dB average positional mean squared error difference between predicted and actual 3rd order HOA. The median qualitative rating showed an 80% improvement in perceived quality over the traditional rendering approach.
Problem

Research questions and friction points this paper is trying to address.

Enhancing spatial accuracy of First-order Ambisonics (FOA)
Converting FOA to higher-order Ambisonics (HOA) using neural networks
Improving perceived audio quality over traditional rendering methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Waveform-domain neural network for Ambisonics super-resolution
Converts first-order to higher-order Ambisonics
Data-driven approach outperforms traditional renderers
🔎 Similar Papers
No similar papers found.