🤖 AI Summary
Existing infrared–visible image fusion methods decouple fusion from downstream tasks, leading to a misalignment between fusion quality and high-level perceptual performance. To address this, we propose a discriminative cross-dimensional evolutionary learning framework that, for the first time, formulates fusion quality enhancement and downstream task optimization (e.g., object detection) as a multi-objective co-evolutionary problem. Our key contributions include: (1) a dynamic weight evolutionary algorithm for joint fusion–perception optimization; (2) an encoder–decoder dual-embedding discriminative enhancer; and (3) a cross-dimensional feature interaction module coupled with a multi-objective loss adaptive balancing strategy. Evaluated on three major benchmarks, our method achieves an average 9.32% improvement in visual quality while significantly boosting performance on high-level vision tasks such as object detection. The source code is publicly available.
📝 Abstract
Infrared and visible image fusion integrates information from distinct spectral bands to enhance image quality by leveraging the strengths and mitigating the limitations of each modality. Existing approaches typically treat image fusion and subsequent high-level tasks as separate processes, resulting in fused images that offer only marginal gains in task performance and fail to provide constructive feedback for optimizing the fusion process. To overcome these limitations, we propose a Discriminative Cross-Dimension Evolutionary Learning Framework, termed DCEvo, which simultaneously enhances visual quality and perception accuracy. Leveraging the robust search capabilities of Evolutionary Learning, our approach formulates the optimization of dual tasks as a multi-objective problem by employing an Evolutionary Algorithm (EA) to dynamically balance loss function parameters. Inspired by visual neuroscience, we integrate a Discriminative Enhancer (DE) within both the encoder and decoder, enabling the effective learning of complementary features from different modalities. Additionally, our Cross-Dimensional Embedding (CDE) block facilitates mutual enhancement between high-dimensional task features and low-dimensional fusion features, ensuring a cohesive and efficient feature integration process. Experimental results on three benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches, achieving an average improvement of 9.32% in visual quality while also enhancing subsequent high-level tasks. The code is available at https://github.com/Beate-Suy-Zhang/DCEvo.