🤖 AI Summary
This work proposes a novel long-range collaborative attention framework for HDR video reconstruction that eliminates the need for explicit frame alignment. Addressing the ghosting and flickering artifacts commonly caused by existing alignment-based methods in complex dynamic scenes, the approach uses a medium-exposure frame as an anchor to dynamically aggregate reliable radiance information from unaligned neighboring frames. Alignment-free feature routing is achieved through a collaborative attention mechanism, while bidirectional long-range temporal modeling combined with a learnable global sequence solver ensures consistent temporal coherence across the entire video sequence. By departing from the conventional align-then-fuse paradigm, the method achieves state-of-the-art performance in visual quality, temporal stability, and computational efficiency.
📝 Abstract
Prevailing High Dynamic Range (HDR) video reconstruction methods are fundamentally trapped in a fragile alignment-and-fusion paradigm. While explicit spatial alignment can successfully recover fine details in controlled environments, it becomes a severe bottleneck in unconstrained dynamic scenes. By forcing rigid alignment across unpredictable motions and varying exposures, these methods inevitably translate registration errors into severe ghosting artifacts and temporal flickering. In this paper, we rethink this conventional prerequisite. Recognizing that explicit alignment is inherently vulnerable to real-world complexities, we propose LoCAtion, a Long-time Collaborative Attention framework that reformulates HDR video generation from a fragile spatial warping task into a robust, alignment-free collaborative feature routing problem. Guided by this new formulation, our architecture explicitly decouples the highly entangled reconstruction task. Rather than struggling to rigidly warp neighboring frames, we anchor the scene on a continuous medium-exposure backbone and utilize collaborative attention to dynamically harvest and inject reliable irradiance cues from unaligned exposures. Furthermore, we introduce a learned global sequence solver. By leveraging bidirectional context and long-range temporal modeling, it propagates corrective signals and structural features across the entire sequence, inherently enforcing whole-video coherence and eliminating jitter. Extensive experiments demonstrate that LoCAtion achieves state-of-the-art visual quality and temporal stability, offering a highly competitive balance between accuracy and computational efficiency.