🤖 AI Summary
This work addresses the limitation of existing infrared and visible image fusion methods, which rely on handcrafted loss functions and objective metrics that often fail to align with human visual preferences, thereby hindering their applicability in human-centric scenarios such as surveillance and autonomous driving. To bridge this gap, the authors propose a reinforcement learning framework grounded in human feedback. They construct the first large-scale, multidimensional dataset comprising subjective quality ratings and artifact annotations for infrared-visible image fusion (IVIF), and develop a domain-specific reward model by integrating fine-tuned large language models with expert evaluations. Leveraging Group Relative Policy Optimization, the fusion network is optimized to significantly enhance perceptual quality and aesthetic consistency. The approach achieves state-of-the-art performance across multiple benchmarks, establishing a new paradigm for human-centered image fusion.
📝 Abstract
Infrared and visible image fusion (IVIF) integrates complementary modalities to enhance scene perception. Current methods predominantly focus on optimizing handcrafted losses and objective metrics, often resulting in fusion outcomes that do not align with human visual preferences. This challenge is further exacerbated by the ill-posed nature of IVIF, which severely limits its effectiveness in human perceptual environments such as security surveillance and driver assistance systems. To address these limitations, we propose a feedback reinforcement framework that bridges human evaluation to infrared and visible image fusion. To address the lack of human-centric evaluation metrics and data, we introduce the first large-scale human feedback dataset for IVIF, containing multidimensional subjective scores and artifact annotations, and enriched by a fine-tuned large language model with expert review. Based on this dataset, we design a domain-specific reward function and train a reward model to quantify perceptual quality. Guided by this reward, we fine-tune the fusion network through Group Relative Policy Optimization, achieving state-of-the-art performance that better aligns fused images with human aesthetics. Code is available at https://github.com/ALKA-Wind/EVAFusion.