Bridging Human Evaluation to Infrared and Visible Image Fusion

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing infrared and visible image fusion methods, which rely on handcrafted loss functions and objective metrics that often fail to align with human visual preferences, thereby hindering their applicability in human-centric scenarios such as surveillance and autonomous driving. To bridge this gap, the authors propose a reinforcement learning framework grounded in human feedback. They construct the first large-scale, multidimensional dataset comprising subjective quality ratings and artifact annotations for infrared-visible image fusion (IVIF), and develop a domain-specific reward model by integrating fine-tuned large language models with expert evaluations. Leveraging Group Relative Policy Optimization, the fusion network is optimized to significantly enhance perceptual quality and aesthetic consistency. The approach achieves state-of-the-art performance across multiple benchmarks, establishing a new paradigm for human-centered image fusion.

Technology Category

Application Category

📝 Abstract
Infrared and visible image fusion (IVIF) integrates complementary modalities to enhance scene perception. Current methods predominantly focus on optimizing handcrafted losses and objective metrics, often resulting in fusion outcomes that do not align with human visual preferences. This challenge is further exacerbated by the ill-posed nature of IVIF, which severely limits its effectiveness in human perceptual environments such as security surveillance and driver assistance systems. To address these limitations, we propose a feedback reinforcement framework that bridges human evaluation to infrared and visible image fusion. To address the lack of human-centric evaluation metrics and data, we introduce the first large-scale human feedback dataset for IVIF, containing multidimensional subjective scores and artifact annotations, and enriched by a fine-tuned large language model with expert review. Based on this dataset, we design a domain-specific reward function and train a reward model to quantify perceptual quality. Guided by this reward, we fine-tune the fusion network through Group Relative Policy Optimization, achieving state-of-the-art performance that better aligns fused images with human aesthetics. Code is available at https://github.com/ALKA-Wind/EVAFusion.
Problem

Research questions and friction points this paper is trying to address.

infrared and visible image fusion
human visual preference
ill-posed problem
perceptual quality
human evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

human feedback
infrared-visible image fusion
reward modeling
perceptual alignment
reinforcement learning
Jinyuan Liu
Jinyuan Liu
Dalian University of Technology
image processingdeep learningimage fusion
X
Xingyuan Li
College of Computer Science and Technology, Zhejiang University
Q
Qingyun Mei
School of Software Technology & DUT-RU International School of ISE, Dalian University of Technology
H
Haoyuan Xu
School of Software Technology & DUT-RU International School of ISE, Dalian University of Technology
Zhiying Jiang
Zhiying Jiang
University of Waterloo
Natural Language ProcessingMachine Learning
Long Ma
Long Ma
Dalian University of Technology
Computer VisionImage Processing
Risheng Liu
Risheng Liu
Professor, Dalian University of Technology
computer visionmachine learningoptimization
Xin Fan
Xin Fan
Professor at Dalian University of Technology
Image processingDiffusion Tensor Imaging