DRFusion: Drift-Resilient Temporally Consistent Infrared-Visible Video Fusion

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of ghosting and drift in infrared and visible video fusion, which arise from temporal misalignment, geometric rigidity, and error accumulation in diffusion models. The authors reformulate the fusion task as a history-conditioned motion generation problem and propose a spectral filtering framework that implicitly models motion dynamics to circumvent explicit alignment. Key innovations include stable historical guidance, a soft temporal anchoring mechanism, and a decoupled structure-motion adaptive strategy, complemented by a two-stage training scheme and latent space optimization. The method achieves state-of-the-art performance in both fusion quality and temporal consistency, effectively suppressing artifacts and drift.
📝 Abstract
Infrared and visible video fusion is essential for achieving comprehensive perception in dynamic scenes. However, maintaining temporal consistency remains a formidable challenge. Conventional methods relying on optical flow often suffer from geometric rigidity and ghosting artifacts. Moreover, standard diffusion-based fusion models typically operate in a frame-by-frame manner; when extended to autoregressive settings, they lack intrinsic temporal constraints and are prone to severe error accumulation and drifting, where minor artifacts amplify over time. To address these limitations, we propose a drift-resilient video fusion method that reformulates the task as history-conditioned motion generation. We introduce Stabilized History Guidance and Soft Temporal Anchoring to reframe temporal consistency as spectral filtering, implicitly aggregating motion dynamics without rigid alignment. Furthermore, our Decoupled Structure-Motion Adaptation strategy bridges pre-trained priors and structural constraints via two-stage training and latent refinement. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both fusion quality and temporal stability.
Problem

Research questions and friction points this paper is trying to address.

temporal consistency
infrared-visible video fusion
error accumulation
drifting
ghosting artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal consistency
diffusion-based fusion
history-conditioned motion generation
soft temporal anchoring
drift resilience
X
Xingyuan Li
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
H
Haoyuan Xu
School of Software Technology & DUT-RU International School of ISE, Dalian University of Technology, Dalian, China
S
Shulin Li
School of Software Technology & DUT-RU International School of ISE, Dalian University of Technology, Dalian, China
X
Xiang Chen
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Zhiying Jiang
Zhiying Jiang
University of Waterloo
Natural Language ProcessingMachine Learning
Jinyuan Liu
Jinyuan Liu
Dalian University of Technology
image processingdeep learningimage fusion