Fusion or Confusion? Assessing the impact of visible-thermal image fusion for automated wildlife detection

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This study investigates how fusing visible-light and thermal infrared imagery enhances automated detection of wildlife—specifically great blue herons and their nests—addressing the alignment and fusion challenges arising from inter-modal discrepancies in field-of-view and spatial resolution. We propose a deep learning–based cross-modal auto-registration method and comparatively evaluate early fusion (via PCA) and late fusion (using CART-based classification and regression trees), with YOLOv11n as the backbone detector. Results demonstrate that dual-modal fusion significantly outperforms single-modality visible-light detection: late fusion improves the F1-score for the “occupied nest” class from 90.2% to 93.0%, while effectively suppressing bimodal false positives at 90% recall. This work empirically validates the discriminative value of thermal infrared cues in complex natural environments and establishes a reproducible technical framework for multimodal remote sensing in biodiversity monitoring.

Technology Category

Application Category

📝 Abstract

Efficient wildlife monitoring methods are necessary for biodiversity conservation and management. The combination of remote sensing, aerial imagery and deep learning offer promising opportunities to renew or improve existing survey methods. The complementary use of visible (VIS) and thermal infrared (TIR) imagery can add information compared to a single-source image and improve results in an automated detection context. However, the alignment and fusion process can be challenging, especially since visible and thermal images usually have different fields of view (FOV) and spatial resolutions. This research presents a case study on the great blue heron (Ardea herodias) to evaluate the performances of synchronous aerial VIS and TIR imagery to automatically detect individuals and nests using a YOLO11n model. Two VIS-TIR fusion methods were tested and compared: an early fusion approach and a late fusion approach, to determine if the addition of the TIR image gives any added value compared to a VIS-only model. VIS and TIR images were automatically aligned using a deep learning model. A principal component analysis fusion method was applied to VIS-TIR image pairs to form the early fusion dataset. A classification and regression tree was used to process the late fusion dataset, based on the detection from the VIS-only and TIR-only trained models. Across all classes, both late and early fusion improved the F1 score compared to the VIS-only model. For the main class, occupied nest, the late fusion improved the F1 score from 90.2 (VIS-only) to 93.0%. This model was also able to identify false positives from both sources with 90% recall. Although fusion methods seem to give better results, this approach comes with a limiting TIR FOV and alignment constraints that eliminate data. Using an aircraft-mounted very high-resolution visible sensor could be an interesting option for operationalizing surveys.

Problem

Research questions and friction points this paper is trying to address.

Evaluates visible-thermal image fusion for automated wildlife detection

Compares early and late fusion methods using YOLO11n on heron data

Assesses if thermal imagery improves detection accuracy over visible-only

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses visible and thermal aerial images for wildlife detection

Uses YOLO11n model with early and late fusion methods

Aligns images via deep learning and PCA for early fusion

🔎 Similar Papers

No similar papers found.