๐ค AI Summary
To address the trade-off between classification accuracy and inference efficiency in real-time acoustic traffic speed monitoring, this paper proposes an adaptive audio frame selection framework integrating deep learning and reinforcement learning. Methodologically, a dual-branch multi-scale convolutional neural network (BMCNN) is designed to extract MFCC and wavelet time-frequency features in parallel; an attention-enhanced deep Q-network (DQN) is introduced to enable dynamic frame sampling and early decision-making. The key contribution lies in embedding reinforcement learning directly into the classification pipeline, enabling end-to-end optimization of frame selectionโthus achieving high accuracy without sacrificing efficiency. Evaluations on the IDMT-Traffic and SZUR-Acoustic datasets yield classification accuracies of 95.99% and 92.3%, respectively, with an average 1.63ร speedup in processing latency. Our approach consistently outperforms baseline methods including A3C, Dueling Double DQN (DDDQN), Self-Attention Actor-Critic (SA2C), PPO, and TD3.
๐ Abstract
Traffic congestion remains a pressing urban challenge, requiring intelligent transportation systems for real-time management. We present a hybrid framework that combines deep learning and reinforcement learning for acoustic vehicle speed classification. A dual-branch BMCNN processes MFCC and wavelet features to capture complementary frequency patterns. An attention-enhanced DQN adaptively selects the minimal number of audio frames and triggers early decisions once confidence thresholds are reached. Evaluations on IDMT-Traffic and our SZUR-Acoustic (Suzhou) datasets show 95.99% and 92.3% accuracy, with up to 1.63x faster average processing via early termination. Compared with A3C, DDDQN, SA2C, PPO, and TD3, the method provides a superior accuracy-efficiency trade-off and is suitable for real-time ITS deployment in heterogeneous urban environments.