🤖 AI Summary
This work addresses the limitations of existing infrared and visible image fusion methods, which are confined to 2D fusion under fixed viewpoints and struggle to preserve global structure and critical modality-specific features. To overcome this, we propose the first 3D fusion framework that supports free-viewpoint rendering. Our approach reconstructs the scene using 3D Gaussian representations, incorporates a Cross-Modal Adjustment (CMA) module to mitigate modality conflicts, and introduces a fusion-aware loss function to jointly optimize geometry and appearance. Coupled with differentiable rendering, the framework enables direct generation of high-quality fused images from arbitrary viewpoints. Experimental results demonstrate that our method significantly outperforms state-of-the-art techniques in both preserving dual-modality characteristics and enhancing overall fusion quality.
📝 Abstract
Infrared-visible image fusion aims to integrate infrared and visible information into a single fused image. Existing 2D fusion methods focus on fusing images from fixed camera viewpoints, neglecting a comprehensive understanding of complex scenarios, which results in the loss of critical information about the scene. To address this limitation, we propose a novel Infrared-Visible Gaussian Fusion (IVGF) framework, which reconstructs scene geometry from multimodal 2D inputs and enables direct rendering of fused images. Specifically, we propose a cross-modal adjustment (CMA) module that modulates the opacity of Gaussians to solve the problem of cross-modal conflicts. Moreover, to preserve the distinctive features from both modalities, we introduce a fusion loss that guides the optimization of CMA, thus ensuring that the fused image retains the critical characteristics of each modality. Comprehensive qualitative and quantitative experiments demonstrate the effectiveness of the proposed method.