A Dynamic Transformer Network for Vehicle Detection

📅 2025-06-03
🏛️ IEEE transactions on consumer electronics
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degraded vehicle detection performance under complex illumination conditions and occlusions, this paper proposes DTNet, a dynamic Transformer-based network. Methodologically, DTNet introduces a novel dynamic convolutional weight guidance mechanism for adaptive feature modulation; designs a hybrid attention module that synergistically integrates channel-wise attention with Transformer-based self-attention to jointly capture global contextual dependencies and local discriminative patterns; incorporates spatially shifted deformable convolutions to enhance geometric structure awareness; and fuses multi-scale features for robust representation learning. Evaluated on multiple vehicle detection benchmarks, DTNet achieves state-of-the-art or highly competitive performance, particularly demonstrating significant improvements in detection accuracy and robustness under challenging scenarios such as low-light conditions and partial occlusion. These results validate the effectiveness of dynamic modeling and the proposed hybrid attention mechanism.

Technology Category

Application Category

📝 Abstract
Stable consumer electronic systems can assist traffic better. Good traffic consumer electronic systems require collaborative work between traffic algorithms and hardware. However, performance of popular traffic algorithms containing vehicle detection methods based on deep networks via learning data relation rather than learning differences in different lighting and occlusions is limited. In this paper, we present a dynamic Transformer network for vehicle detection (DTNet). DTNet utilizes a dynamic convolution to guide a deep network to dynamically generate weights to enhance adaptability of an obtained detector. Taking into relations of different information account, a mixed attention mechanism based channel attention and Transformer is exploited to strengthen relations of channels and pixels to extract more salient information for vehicle detection. To overcome the drawback of difference in an image account, a translation-variant convolution relies on spatial location information to refine obtained structural information for vehicle detection. Experimental results illustrate that our DTNet is competitive for vehicle detection. Code of the proposed DTNet can be obtained at https://github.com/hellloxiaotian/DTNet.
Problem

Research questions and friction points this paper is trying to address.

Improving vehicle detection in varying lighting and occlusion conditions
Enhancing adaptability of detectors through dynamic weight generation
Strengthening channel and pixel relations for salient feature extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Transformer network for vehicle detection
Mixed attention mechanism enhances salient information
Translation-variant convolution refines structural information
🔎 Similar Papers
No similar papers found.
C
Chunwei Tian
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 15001, China
K
Kai Liu
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 15001, China
Bob Zhang
Bob Zhang
University of Macau
Biometricspattern recognitionimage processing
Z
Zhixiang Huang
Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education and Key Laboratory of Electromagnetic Environmental Sensing, Anhui University, Hefei, 230601, China
C
Chia-Wen Lin
Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan
D
David Zhang
School of Data Science, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China