EvaNet: Towards More Efficient and Consistent Infrared and Visible Image Fusion Assessment

๐Ÿ“… 2026-04-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing evaluation metrics for infrared and visible image fusion are often borrowed from other vision tasks, suffering from both high computational cost and limited fidelity in reflecting perceptual fusion quality. To address this, this work proposes the first unified assessment framework tailored specifically for this task. It employs a lightweight neural network to approximate conventional metrics and adopts a โ€œdivide-and-conquerโ€ strategy by decomposing the fused image into infrared and visible components, evaluating each for information preservation separately. The framework further integrates no-reference scoring with downstream task performance, leveraging contrastive learning and scene-aware guidance derived from large language models to better align with human visual preferences. Experiments on multiple benchmark datasets demonstrate up to a 1000ร— acceleration in evaluation speed while significantly improving consistency with human perception.
๐Ÿ“ Abstract
Evaluation is essential in image fusion research, yet most existing metrics are directly borrowed from other vision tasks without proper adaptation. These traditional metrics, often based on complex image transformations, not only fail to capture the true quality of the fusion results but also are computationally demanding. To address these issues, we propose a unified evaluation framework specifically tailored for image fusion. At its core is a lightweight network designed efficiently to approximate widely used metrics, following a divide-and-conquer strategy. Unlike conventional approaches that directly assess similarity between fused and source images, we first decompose the fusion result into infrared and visible components. The evaluation model is then used to measure the degree of information preservation in these separated components, effectively disentangling the fusion evaluation process. During training, we incorporate a contrastive learning strategy and inform our evaluation model by perceptual scene assessment provided by a large language model. Last, we propose the first consistency evaluation framework, which measures the alignment between image fusion metrics and human visual perception, using both independent no-reference scores and downstream tasks performance as objective references. Extensive experiments show that our learning-based evaluation paradigm delivers both superior efficiency (up to 1,000 times faster) and greater consistency across a range of standard image fusion benchmarks. Our code will be publicly available at https://github.com/AWCXV/EvaNet.
Problem

Research questions and friction points this paper is trying to address.

image fusion evaluation
infrared and visible image fusion
evaluation consistency
human visual perception
fusion metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

image fusion evaluation
lightweight network
contrastive learning
perceptual consistency
infrared-visible image fusion
๐Ÿ”Ž Similar Papers
No similar papers found.