An Efficient GPU-based Implementation for Noise Robust Sound Source Localization

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the CPU real-time performance bottleneck in multi-channel sound source localization (SSL) for embedded robotic auditory systems—caused by computationally intensive matrix operations—this paper proposes a full-stack GPU acceleration framework for the GSVD-MUSIC algorithm. It presents the first end-to-end GPU-accelerated implementation of the HARK framework on both embedded (Jetson AGX Orin) and server-grade (NVIDIA A100) platforms. Leveraging CUDA-optimized kernels, parallelized generalized singular value decomposition (GSVD), and tightly coupled MUSIC-based direction-of-arrival estimation, the design overcomes traditional CPU limitations. Experimental results show speedups of 4645.1× for GSVD and 8.8× for the entire SSL module on Jetson AGX Orin, and 2223.4× for GSVD and 8.95× end-to-end on A100. The system enables real-time SSL with up to 60-channel microphone arrays. This work establishes a scalable, high-efficiency computational paradigm for high-accuracy, large-scale auditory perception on resource-constrained platforms.

Technology Category

Application Category

📝 Abstract
Robot audition, encompassing Sound Source Localization (SSL), Sound Source Separation (SSS), and Automatic Speech Recognition (ASR), enables robots and smart devices to acquire auditory capabilities similar to human hearing. Despite their wide applicability, processing multi-channel audio signals from microphone arrays in SSL involves computationally intensive matrix operations, which can hinder efficient deployment on Central Processing Units (CPUs), particularly in embedded systems with limited CPU resources. This paper introduces a GPU-based implementation of SSL for robot audition, utilizing the Generalized Singular Value Decomposition-based Multiple Signal Classification (GSVD-MUSIC), a noise-robust algorithm, within the HARK platform, an open-source software suite. For a 60-channel microphone array, the proposed implementation achieves significant performance improvements. On the Jetson AGX Orin, an embedded device powered by an NVIDIA GPU and ARM Cortex-A78AE v8.2 64-bit CPUs, we observe speedups of 4645.1x for GSVD calculations and 8.8x for the SSL module, while speedups of 2223.4x for GSVD calculation and 8.95x for the entire SSL module on a server configured with an NVIDIA A100 GPU and AMD EPYC 7352 CPUs, making real-time processing feasible for large-scale microphone arrays and providing ample capacity for real-time processing of potential subsequent machine learning or deep learning tasks.
Problem

Research questions and friction points this paper is trying to address.

Efficient GPU-based SSL for noise-robust robot audition
Accelerating GSVD-MUSIC algorithm for real-time microphone array processing
Overcoming CPU limitations in embedded systems with GPU optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-based implementation for noise-robust SSL
Utilizes GSVD-MUSIC algorithm in HARK platform
Achieves significant speedups on embedded and server GPUs
🔎 Similar Papers
No similar papers found.
Z
Zirui Lin
Dept. of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology, Tokyo, Japan
M
Masayuki Takigahira
Honda Research Institute Japan Co., Ltd., Saitama, Japan
N
Naoya Terakado
Honda Research Institute Japan Co., Ltd., Saitama, Japan
H
Haris Gulzar
NTT Software Innovation Center, Tokyo, Japan
M
M. Busto
NTT Software Innovation Center, Tokyo, Japan
Takeharu Eda
Takeharu Eda
NTT Software Innovation Center
Computer visionsurveillancedatabasesWebsearch
Katsutoshi Itoyama
Katsutoshi Itoyama
Tokyo Institute of Technology
Music information processingstatistical signal processingmachine learning
Kazuhiro Nakadai
Kazuhiro Nakadai
Institute of Science Tokyo
Robot Audition and Scene AnalysisArtificial IntelligenceSignal and Speech ProcessingRobotics
Hideharu Amano
Hideharu Amano
Keio University
Computer ArchitectureReconfigurable SystemInterconnection Network