๐ค AI Summary
Thermal imaging cameras offer robust perception under low-light conditions but struggle to distinguish traffic signs with similar thermal emissivity (e.g., road signs and license plates), leading to semantic understanding failures in autonomous driving. To address this nighttime sign perception blind spot, we propose an unsupervised thermalโevent video fusion enhancement method. Our approach features: (1) a motion-guided spatiotemporal alignment network that leverages coarse motion cues from thermal frames to synchronize asynchronous event streams; and (2) a detail enhancement module that exploits high-temporal-resolution event signals to compensate for texture deficiencies in thermal imagery, enabling cross-modal complementarity and temporally consistent representation. Evaluated on a real-world low-light dataset, our method significantly improves sign contour generation quality and detection accuracy (mAP increased by 12.7%), thereby enhancing the robustness of nighttime semantic perception.
๐ Abstract
The thermal camera excels at perceiving outdoor environments under low-light conditions, making it ideal for applications such as nighttime autonomous driving and unmanned navigation. However, thermal cameras encounter challenges when capturing signage from objects made of similar materials, which can pose safety risks for accurately understanding semantics in autonomous driving systems. In contrast, the neuromorphic vision camera, also known as an event camera, detects changes in light intensity asynchronously and has proven effective in high-speed, low-light traffic environments. Recognizing the complementary characteristics of these two modalities, this paper proposes UTA-Sign, an unsupervised thermal-event video augmentation for traffic signage in low-illumination environments, targeting elements such as license plates and roadblock indicators. To address the signage blind spots of thermal imaging and the non-uniform sampling of event cameras, we developed a dual-boosting mechanism that fuses thermal frames and event signals for consistent signage representation over time. The proposed method utilizes thermal frames to provide accurate motion cues as temporal references for aligning the uneven event signals. At the same time, event signals contribute subtle signage content to the raw thermal frames, enhancing the overall understanding of the environment. The proposed method is validated on datasets collected from real-world scenarios, demonstrating superior quality in traffic signage sketching and improved detection accuracy at the perceptual level.