DOA Estimation with Lightweight Network on LLM-Aided Simulated Acoustic Scenes

πŸ“… 2025-11-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing DOA estimation models are predominantly trained on synthetic room impulse response (RIR) data, suffering from limited acoustic diversity and consequently poor generalization to real-world environments. To address this, we propose LightDOAβ€”a lightweight, deep separable convolution-based DOA estimator supporting multi-channel input with low computational overhead. Crucially, we introduce, for the first time, large language model (LLM)-assisted generation of diverse spatial audio scenes, significantly enhancing acoustic coverage and realism of training data. Extensive experiments across varying reverberation conditions, noise types, and microphone array configurations demonstrate that LightDOA achieves high estimation accuracy while exhibiting strong robustness. Moreover, it attains 3–5Γ— higher inference efficiency compared to state-of-the-art models, making it particularly suitable for resource-constrained edge devices.

Technology Category

Application Category

πŸ“ Abstract
Direction-of-Arrival (DOA) estimation is critical in spatial audio and acoustic signal processing, with wide-ranging applications in real-world. Most existing DOA models are trained on synthetic data by convolving clean speech with room impulse responses (RIRs), which limits their generalizability due to constrained acoustic diversity. In this paper, we revisit DOA estimation using a recently introduced dataset constructed with the assistance of large language models (LLMs), which provides more realistic and diverse spatial audio scenes. We benchmark several representative neural-based DOA methods on this dataset and propose LightDOA, a lightweight DOA estimation model based on depthwise separable convolutions, specifically designed for mutil-channel input in varying environments. Experimental results show that LightDOA achieves satisfactory accuracy and robustness across various acoustic scenes while maintaining low computational complexity. This study not only highlights the potential of spatial audio synthesized with the assistance of LLMs in advancing robust and efficient DOA estimation research, but also highlights LightDOA as efficient solution for resource-constrained applications.
Problem

Research questions and friction points this paper is trying to address.

Improving DOA estimation generalizability with LLM-aided diverse acoustic scenes
Developing lightweight model for robust DOA estimation in varying environments
Addressing computational complexity limitations in multi-channel DOA estimation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-aided simulated acoustic scenes generation
Lightweight DOA model with depthwise separable convolutions
Multi-channel input processing for varying environments
πŸ”Ž Similar Papers
No similar papers found.
H
Haowen Li
Nanyang Technological University, Singapore
Z
Zheng-wu Luo
Nanyang Technological University, Singapore
D
Dongyuan Shi
Northwestern Polytechnical University, China
Boxiang Wang
Boxiang Wang
Nvidia
Machine LearningParallel Processing
J
Junwei Ji
Nanyang Technological University, Singapore
Z
Ziyi Yang
Nanyang Technological University, Singapore
Woon-Seng Gan
Woon-Seng Gan
Professor of Audio Engineering and Director of Smart Nation Lab @ Nanyang Technological University,
Active Noise ControlMachine & Deep LearningSpatial AudioPerceptual Evaluation