Ambisonics Super-Resolution Using A Waveform-Domain Neural Network

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

To address the low spatial resolution of first-order Ambisonics (FOA) caused by its limited channel count—posing a fundamental trade-off between efficiency and perceptual audio quality—this paper proposes a waveform-domain deep learning approach for FOA-to-higher-order Ambisonics (HOA) super-resolution. Specifically, we introduce an end-to-end fully convolutional time-domain network, an enhanced variant of Conv-TasNet, that directly maps four-channel FOA waveforms to high-fidelity third-order HOA waveforms, bypassing conventional physics-based or psychoacoustically motivated modeling constraints. Experimental results demonstrate an average reduction of 0.6 dB in source localization error and an 80% improvement in subjective MUSHRA scores, significantly enhancing spatial immersion and auditory clarity while preserving FOA’s low-bandwidth advantage.

Technology Category

Application Category

📝 Abstract

Ambisonics is a spatial audio format describing a sound field. First-order Ambisonics (FOA) is a popular format comprising only four channels. This limited channel count comes at the expense of spatial accuracy. Ideally one would be able to take the efficiency of a FOA format without its limitations. We have devised a data-driven spatial audio solution that retains the efficiency of the FOA format but achieves quality that surpasses conventional renderers. Utilizing a fully convolutional time-domain audio neural network (Conv-TasNet), we created a solution that takes a FOA input and provides a higher order Ambisonics (HOA) output. This data driven approach is novel when compared to typical physics and psychoacoustic based renderers. Quantitative evaluations showed a 0.6dB average positional mean squared error difference between predicted and actual 3rd order HOA. The median qualitative rating showed an 80% improvement in perceived quality over the traditional rendering approach.

Problem

Research questions and friction points this paper is trying to address.

Enhancing spatial accuracy of First-order Ambisonics (FOA)

Converting FOA to higher-order Ambisonics (HOA) using neural networks

Improving perceived audio quality over traditional rendering methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Waveform-domain neural network for Ambisonics super-resolution

Converts first-order to higher-order Ambisonics

Data-driven approach outperforms traditional renderers

🔎 Similar Papers

No similar papers found.