๐ค AI Summary
To address the failure of visible-light navigation for unmanned aerial vehicles (UAVs) under low-visibility conditions (e.g., nighttime or smoke), this paper proposes a task-oriented thermalโvisible cross-spectral point feature learning method. Unlike conventional cross-modal learning paradigms relying on appearance-based alignment, our approach introduces the first end-to-end differentiable framework supervised by geometric matching and image registration, jointly optimizing keypoint detection, description, and registration. By decoupling cross-modal features and enforcing task-driven joint training, the method significantly improves geometric consistency between thermal and visible-light images. On the MultiPoint dataset, over 75% of corner correspondences achieve registration errors below 10 pixels. Moreover, the method seamlessly integrates with existing matching and registration pipelines. It establishes a robust, embeddable cross-spectral perception paradigm for visual navigation in challenging environments.
๐ Abstract
Unmanned aerial vehicles (UAVs) enable operations in remote and hazardous environments, yet the visible-spectrum, camera-based navigation systems often relied upon by UAVs struggle in low-visibility conditions. Thermal cameras, which capture long-wave infrared radiation, are able to function effectively in darkness and smoke, where visible-light cameras fail. This work explores learned cross-spectral (thermal-visible) point features as a means to integrate thermal imagery into established camera-based navigation systems. Existing methods typically train a feature network's detection and description outputs directly, which often focuses training on image regions where thermal and visible-spectrum images exhibit similar appearance. Aiming to more fully utilize the available data, we propose a method to train the feature network on the tasks of matching and registration. We run our feature network on thermal-visible image pairs, then feed the network response into a differentiable registration pipeline. Losses are applied to the matching and registration estimates of this pipeline. Our selected model, trained on the task of matching, achieves a registration error (corner error) below 10 pixels for more than 75% of estimates on the MultiPoint dataset. We further demonstrate that our model can also be used with a classical pipeline for matching and registration.