Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
Existing infrared and visible image fusion methods struggle to simultaneously accommodate heterogeneous preferences from human vision and machine vision, and lack adaptive alignment capabilities. To address this, this work proposes DPOFusion, a novel framework that introduces Direct Preference Optimization (DPO) into image fusion for the first time. By integrating an Attribute-Aligned Latent Diffusion Model (PALDM) with a Preference-Controlled Latent Diffusion Model (PCLDM), DPOFusion leverages instance-level DPO to enable task-guided, preference-adaptive fusion generation. The method effectively aligns multi-source preferences—including those from human observers, vision-language models, and downstream task networks—achieving state-of-the-art performance in preference alignment accuracy, fusion quality, and transferability to downstream tasks.
📝 Abstract
As a key technique in multi-modal processing, infrared and visible image fusion (IVIF) plays a crucial role in integrating complementary spectral information for visual enhancement and downstream vision tasks. Despite remarkable progress, existing methods struggle to flexibly accommodate heterogeneous demands. Achieving adaptive fusion that aligns with various preferences from both human and machine vision remains an open and challenging problem. To address this challenge, we propose DPOFusion, a direct preference optimization (DPO) framework integrating the property-aligned latent diffusion model (PALDM) and the preference-controllable latent diffusion model (PCLDM), enabling task-guided, preference-adaptive IVIF for both human and machine vision. The PALDM leverages a latent fusion prior and a joint conditional loss to generate diverse candidate fusion results with various properties. PCLDM is subsequently fine-tuned via instance direct preference optimization (IDPO), enabling direct control of the final fusion results with heterogeneous preference signals. Experimental results demonstrate that our framework not only attains precise preference alignment among humans, vision-language models, and task-driven networks, but also sets a new benchmark for adaptive fusion quality and task-oriented transferability.
Problem

Research questions and friction points this paper is trying to address.

infrared and visible image fusion
heterogeneous demands
preference alignment
adaptive fusion
multi-modal processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct Preference Optimization
Latent Diffusion Model
Infrared-Visible Image Fusion
Preference Alignment
Adaptive Fusion
W
Weijian Su
School of Computer Science and Technology, Dalian University of Technology; Key Laboratory of Social Computing and Cognitive Intelligence (Dalian University of Technology), Ministry of Education
S
Songqian Zhang
School of Computer Science and Technology, Dalian University of Technology; Key Laboratory of Social Computing and Cognitive Intelligence (Dalian University of Technology), Ministry of Education
Y
Yuqi Han
School of Computer Science and Technology, Dalian University of Technology; Key Laboratory of Social Computing and Cognitive Intelligence (Dalian University of Technology), Ministry of Education
J
Jian Zhuang
School of Computer Science and Technology, Dalian University of Technology; Key Laboratory of Social Computing and Cognitive Intelligence (Dalian University of Technology), Ministry of Education
Y
Yongdong Huang
Institute of Image Processing and Understanding, North Minzu University
Qiang Zhang
Qiang Zhang
Dalian University
Big data analysis and processingMachine behavior and human-machine collaboration