Hybrid Fusion: One-Minute Efficient Training for Zero-Shot Cross-Domain Image Fusion

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image fusion methods are often limited by insufficient adaptability or inefficiencies in deep learning model training, particularly due to mismatches between training and inference resolutions. This work proposes a hybrid fusion framework that, for the first time, decouples policy learning from pixel synthesis: a lightweight, learnable U-Net generates dynamic guidance maps to drive a fixed Laplacian pyramid kernel for linear pixel-level fusion. The approach enables full-resolution end-to-end training without reliance on external models, achieving state-of-the-art performance within 1–2 minutes of training from scratch. Moreover, it demonstrates zero-shot cross-domain generalization, effectively balancing efficiency, fidelity, and robustness across diverse tasks such as infrared-visible and medical image fusion.

Technology Category

Application Category

📝 Abstract
Image fusion seeks to integrate complementary information from multiple sources into a single, superior image. While traditional methods are fast, they lack adaptability and performance. Conversely, deep learning approaches achieve state-of-the-art (SOTA) results but suffer from critical inefficiencies: their reliance on slow, resource-intensive, patch-based training introduces a significant gap with full-resolution inference. We propose a novel hybrid framework that resolves this trade-off. Our method utilizes a learnable U-Net to generate a dynamic guidance map that directs a classic, fixed Laplacian pyramid fusion kernel. This decoupling of policy learning from pixel synthesis enables remarkably efficient full-resolution training, eliminating the train-inference gap. Consequently, our model achieves SOTA-comparable performance in about one minute on a RTX 4090 or two minutes on a consumer laptop GPU from scratch without any external model and demonstrates powerful zero-shot generalization across diverse tasks, from infrared-visible to medical imaging. By design, the fused output is linearly constructed solely from source information, ensuring high faithfulness for critical applications. The codes are available at https://github.com/Zirconium233/HybridFusion
Problem

Research questions and friction points this paper is trying to address.

image fusion
zero-shot
cross-domain
training efficiency
full-resolution inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Fusion
Zero-Shot Generalization
Full-Resolution Training
Laplacian Pyramid
U-Net Guidance
🔎 Similar Papers