🤖 AI Summary
To address the significant performance degradation of RT-DETR under adverse weather conditions such as fog, this paper proposes a weather-robust detection framework comprising three key components: (1) a fog-sensitive scaled attention mechanism that dynamically models the impact of fog density on feature responses; (2) a dual-stream self- and cross-attention fusion encoder that jointly captures complementary weather-related information from both clear and foggy images; and (3) perception-loss-guided teacher–student domain-invariant feature distillation for label-free domain adaptation. Experimental results indicate that individual modules have not yet consistently outperformed the baseline; however, the study systematically identifies a critical bottleneck—the tight coupling between weather modeling and attention mechanisms. This work establishes the first interpretable analytical benchmark for weather-aware object detection and provides concrete, actionable directions for future improvement.
📝 Abstract
RT-DETRs have shown strong performance across various computer vision tasks but are known to degrade under challenging weather conditions such as fog. In this work, we investigate three novel approaches to enhance RT-DETR robustness in foggy environments: (1) Domain Adaptation via Perceptual Loss, which distills domain-invariant features from a teacher network to a student using perceptual supervision; (2) Weather Adaptive Attention, which augments the attention mechanism with fog-sensitive scaling by introducing an auxiliary foggy image stream; and (3) Weather Fusion Encoder, which integrates a dual-stream encoder architecture that fuses clear and foggy image features via multi-head self and cross-attention. Despite the architectural innovations, none of the proposed methods consistently outperform the baseline RT-DETR. We analyze the limitations and potential causes, offering insights for future research in weather-aware object detection.