One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image fusion methods rely on high-level semantic task interactions, suffering from semantic gaps that limit generalizability and universality. This paper proposes a low-level vision–driven paradigm—specifically, pixel-level reconstruction—as a foundation for universal fusion, eliminating task-specific semantic modeling. We introduce GIFNet, a unified representation architecture trained via multi-task joint learning under pixel-level supervision. Our key contributions are: (i) the first task-agnostic fusion framework guided by low-level task interactions; (ii) zero-shot generalization of a single model to unseen modality pairs (e.g., infrared/visible-light, MRI/PET); and (iii) emergent capability for single-modality image enhancement. GIFNet achieves state-of-the-art performance across diverse cross-modal fusion benchmarks, demonstrating superior generalizability, architectural unity, and practical applicability.

Technology Category

Application Category

📝 Abstract
Advanced image fusion methods mostly prioritise high-level missions, where task interaction struggles with semantic gaps, requiring complex bridging mechanisms. In contrast, we propose to leverage low-level vision tasks from digital photography fusion, allowing for effective feature interaction through pixel-level supervision. This new paradigm provides strong guidance for unsupervised multimodal fusion without relying on abstract semantics, enhancing task-shared feature learning for broader applicability. Owning to the hybrid image features and enhanced universal representations, the proposed GIFNet supports diverse fusion tasks, achieving high performance across both seen and unseen scenarios with a single model. Uniquely, experimental results reveal that our framework also supports single-modality enhancement, offering superior flexibility for practical applications. Our code will be available at https://github.com/AWCXV/GIFNet.
Problem

Research questions and friction points this paper is trying to address.

Low-level task interaction in image fusion
Unsupervised multimodal fusion without semantics
Single model for diverse fusion tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages low-level vision tasks
Enhances task-shared feature learning
Supports diverse fusion tasks
🔎 Similar Papers
No similar papers found.
C
Chunyang Cheng
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
T
Tianyang Xu
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
Z
Zhenhua Feng
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
X
Xiaojun Wu
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
Zhangyong Tang
Zhangyong Tang
Jiangnan University
H
Hui Li
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
Z
Zeyang Zhang
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
S
Sara Atito
Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK
M
Muhammad Awais
Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK
Josef Kittler
Josef Kittler
University of Surrey
engineering