🤖 AI Summary
To address the lack of a dedicated multimodal image registration benchmark for unmanned aerial vehicle (UAV) aerial scenarios, this paper introduces UAV-MMR—the first publicly available benchmark designed for complex imaging conditions. The dataset comprises 7,969 visible–infrared–registered-visible image triplets, captured across multiple altitudes, viewing angles, and weather conditions. It features pixel-level registration ground truth and six-dimensional imaging condition attributes—novel annotations introduced herein—and employs a semi-automated pipeline to ensure high-precision labeling. Additionally, UAV-MMR provides 77,753 visible-light and 78,409 infrared bounding boxes across 11 object classes. This comprehensive annotation enables robust training and evaluation of multimodal registration, fusion, and detection algorithms, as well as downstream task research. UAV-MMR thus establishes a foundational resource for advancing multimodal perception in challenging UAV-based remote sensing applications.
📝 Abstract
Multimodal fusion has become a key enabler for UAV-based object detection, as each modality provides complementary cues for robust feature extraction. However, due to significant differences in resolution, field of view, and sensing characteristics across modalities, accurate registration is a prerequisite before fusion. Despite its importance, there is currently no publicly available benchmark specifically designed for multimodal registration in UAV-based aerial scenarios, which severely limits the development and evaluation of advanced registration methods under real-world conditions. To bridge this gap, we present ATR-UMMIM, the first benchmark dataset specifically tailored for multimodal image registration in UAV-based applications. This dataset includes 7,969 triplets of raw visible, infrared, and precisely registered visible images captured covers diverse scenarios including flight altitudes from 80m to 300m, camera angles from 0° to 75°, and all-day, all-year temporal variations under rich weather and illumination conditions. To ensure high registration quality, we design a semi-automated annotation pipeline to introduce reliable pixel-level ground truth to each triplet. In addition, each triplet is annotated with six imaging condition attributes, enabling benchmarking of registration robustness under real-world deployment settings. To further support downstream tasks, we provide object-level annotations on all registered images, covering 11 object categories with 77,753 visible and 78,409 infrared bounding boxes. We believe ATR-UMMIM will serve as a foundational benchmark for advancing multimodal registration, fusion, and perception in real-world UAV scenarios. The datatset can be download from https://github.com/supercpy/ATR-UMMIM