Audio-Based Pedestrian Detection in the Presence of Vehicular Noise

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenging problem of audio-based pedestrian detection in high-noise traffic environments—particularly from roadside acoustic perspectives. We propose the first dedicated analytical framework for roadside acoustic perception, built upon a large-scale, synchronized audio-visual dataset comprising 1,321 hours of real-world road recordings, meticulously annotated with frame-level pedestrian labels and video thumbnails, and characterized by intense vehicular noise. Methodologically, we fuse 16-kHz audio with 1-fps visual cues to enable multimodal alignment, and conduct comprehensive evaluations including cross-dataset benchmarking, noise impact modeling, and cross-domain robustness testing. Experimental results demonstrate the critical role of acoustic context in detection performance and quantitatively reveal substantial degradation of existing models under complex noise conditions. Our contributions include: (1) the first public benchmark dataset for auditory pedestrian perception in traffic; (2) a reproducible, multimodal analytical framework; and (3) key empirical findings that advance understanding of audio-visual sensing in noisy urban environments.

Technology Category

Application Category

📝 Abstract

Audio-based pedestrian detection is a challenging task and has, thus far, only been explored in noise-limited environments. We present a new dataset, results, and a detailed analysis of the state-of-the-art in audio-based pedestrian detection in the presence of vehicular noise. In our study, we conduct three analyses: (i) cross-dataset evaluation between noisy and noise-limited environments, (ii) an assessment of the impact of noisy data on model performance, highlighting the influence of acoustic context, and (iii) an evaluation of the model's predictive robustness on out-of-domain sounds. The new dataset is a comprehensive 1321-hour roadside dataset. It incorporates traffic-rich soundscapes. Each recording includes 16kHz audio synchronized with frame-level pedestrian annotations and 1fps video thumbnails.

Problem

Research questions and friction points this paper is trying to address.

Detecting pedestrians using audio in noisy vehicular environments

Evaluating model performance degradation due to traffic noise interference

Assessing predictive robustness on out-of-domain acoustic contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-dataset evaluation between noisy and noise-limited environments

Assessment of noisy data impact on model performance

Evaluation of model robustness on out-of-domain sounds

🔎 Similar Papers

No similar papers found.