A Dynamic Transformer Network for Vehicle Detection

📅 2025-06-03

🏛️ IEEE transactions on consumer electronics

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the degraded vehicle detection performance under complex illumination conditions and occlusions, this paper proposes DTNet, a dynamic Transformer-based network. Methodologically, DTNet introduces a novel dynamic convolutional weight guidance mechanism for adaptive feature modulation; designs a hybrid attention module that synergistically integrates channel-wise attention with Transformer-based self-attention to jointly capture global contextual dependencies and local discriminative patterns; incorporates spatially shifted deformable convolutions to enhance geometric structure awareness; and fuses multi-scale features for robust representation learning. Evaluated on multiple vehicle detection benchmarks, DTNet achieves state-of-the-art or highly competitive performance, particularly demonstrating significant improvements in detection accuracy and robustness under challenging scenarios such as low-light conditions and partial occlusion. These results validate the effectiveness of dynamic modeling and the proposed hybrid attention mechanism.

Technology Category

Application Category

📝 Abstract

Stable consumer electronic systems can assist traffic better. Good traffic consumer electronic systems require collaborative work between traffic algorithms and hardware. However, performance of popular traffic algorithms containing vehicle detection methods based on deep networks via learning data relation rather than learning differences in different lighting and occlusions is limited. In this paper, we present a dynamic Transformer network for vehicle detection (DTNet). DTNet utilizes a dynamic convolution to guide a deep network to dynamically generate weights to enhance adaptability of an obtained detector. Taking into relations of different information account, a mixed attention mechanism based channel attention and Transformer is exploited to strengthen relations of channels and pixels to extract more salient information for vehicle detection. To overcome the drawback of difference in an image account, a translation-variant convolution relies on spatial location information to refine obtained structural information for vehicle detection. Experimental results illustrate that our DTNet is competitive for vehicle detection. Code of the proposed DTNet can be obtained at https://github.com/hellloxiaotian/DTNet.

Problem

Research questions and friction points this paper is trying to address.

Improving vehicle detection in varying lighting and occlusion conditions

Enhancing adaptability of detectors through dynamic weight generation

Strengthening channel and pixel relations for salient feature extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Transformer network for vehicle detection

Mixed attention mechanism enhances salient information

Translation-variant convolution refines structural information

🔎 Similar Papers

No similar papers found.