LoCAtion: Long-time Collaborative Attention Framework for High Dynamic Range Video Reconstruction

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work proposes a novel long-range collaborative attention framework for HDR video reconstruction that eliminates the need for explicit frame alignment. Addressing the ghosting and flickering artifacts commonly caused by existing alignment-based methods in complex dynamic scenes, the approach uses a medium-exposure frame as an anchor to dynamically aggregate reliable radiance information from unaligned neighboring frames. Alignment-free feature routing is achieved through a collaborative attention mechanism, while bidirectional long-range temporal modeling combined with a learnable global sequence solver ensures consistent temporal coherence across the entire video sequence. By departing from the conventional align-then-fuse paradigm, the method achieves state-of-the-art performance in visual quality, temporal stability, and computational efficiency.

Technology Category

Application Category

📝 Abstract

Prevailing High Dynamic Range (HDR) video reconstruction methods are fundamentally trapped in a fragile alignment-and-fusion paradigm. While explicit spatial alignment can successfully recover fine details in controlled environments, it becomes a severe bottleneck in unconstrained dynamic scenes. By forcing rigid alignment across unpredictable motions and varying exposures, these methods inevitably translate registration errors into severe ghosting artifacts and temporal flickering. In this paper, we rethink this conventional prerequisite. Recognizing that explicit alignment is inherently vulnerable to real-world complexities, we propose LoCAtion, a Long-time Collaborative Attention framework that reformulates HDR video generation from a fragile spatial warping task into a robust, alignment-free collaborative feature routing problem. Guided by this new formulation, our architecture explicitly decouples the highly entangled reconstruction task. Rather than struggling to rigidly warp neighboring frames, we anchor the scene on a continuous medium-exposure backbone and utilize collaborative attention to dynamically harvest and inject reliable irradiance cues from unaligned exposures. Furthermore, we introduce a learned global sequence solver. By leveraging bidirectional context and long-range temporal modeling, it propagates corrective signals and structural features across the entire sequence, inherently enforcing whole-video coherence and eliminating jitter. Extensive experiments demonstrate that LoCAtion achieves state-of-the-art visual quality and temporal stability, offering a highly competitive balance between accuracy and computational efficiency.

Problem

Research questions and friction points this paper is trying to address.

HDR video reconstruction

ghosting artifacts

temporal flickering

spatial alignment

dynamic scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

alignment-free

collaborative attention

HDR video reconstruction