Audio-Based Pedestrian Detection in the Presence of Vehicular Noise

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of audio-based pedestrian detection in high-noise traffic environments—particularly from roadside acoustic perspectives. We propose the first dedicated analytical framework for roadside acoustic perception, built upon a large-scale, synchronized audio-visual dataset comprising 1,321 hours of real-world road recordings, meticulously annotated with frame-level pedestrian labels and video thumbnails, and characterized by intense vehicular noise. Methodologically, we fuse 16-kHz audio with 1-fps visual cues to enable multimodal alignment, and conduct comprehensive evaluations including cross-dataset benchmarking, noise impact modeling, and cross-domain robustness testing. Experimental results demonstrate the critical role of acoustic context in detection performance and quantitatively reveal substantial degradation of existing models under complex noise conditions. Our contributions include: (1) the first public benchmark dataset for auditory pedestrian perception in traffic; (2) a reproducible, multimodal analytical framework; and (3) key empirical findings that advance understanding of audio-visual sensing in noisy urban environments.

Technology Category

Application Category

📝 Abstract
Audio-based pedestrian detection is a challenging task and has, thus far, only been explored in noise-limited environments. We present a new dataset, results, and a detailed analysis of the state-of-the-art in audio-based pedestrian detection in the presence of vehicular noise. In our study, we conduct three analyses: (i) cross-dataset evaluation between noisy and noise-limited environments, (ii) an assessment of the impact of noisy data on model performance, highlighting the influence of acoustic context, and (iii) an evaluation of the model's predictive robustness on out-of-domain sounds. The new dataset is a comprehensive 1321-hour roadside dataset. It incorporates traffic-rich soundscapes. Each recording includes 16kHz audio synchronized with frame-level pedestrian annotations and 1fps video thumbnails.
Problem

Research questions and friction points this paper is trying to address.

Detecting pedestrians using audio in noisy vehicular environments
Evaluating model performance degradation due to traffic noise interference
Assessing predictive robustness on out-of-domain acoustic contexts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-dataset evaluation between noisy and noise-limited environments
Assessment of noisy data impact on model performance
Evaluation of model robustness on out-of-domain sounds
🔎 Similar Papers
No similar papers found.
Y
Yonghyun Kim
Music Informatics Group, Georgia Institute of Technology, USA
Chaeyeon Han
Chaeyeon Han
Ph.D. student, School of City and Regional Planning, Georgia Institute of Technology
urban informaticsurban planninghuman mobilityclimate migrationclimate mobility
A
Akash Sarode
College of Computing, Georgia Institute of Technology, USA
N
Noah Posner
Center for Urban Resilience and Analytics, Georgia Institute of Technology, USA
S
S. Guhathakurta
Center for Urban Resilience and Analytics, Georgia Institute of Technology, USA
Alexander Lerch
Alexander Lerch
Music Informatics Group, Georgia Institute of Technology
audio content analysismusic information retrievalsemantic audioaudio signal processingmusic generation