Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes WMNet, a novel deep learning framework for high dynamic range (HDR) video reconstruction from low dynamic range (LDR) inputs, specifically addressing color distortion and temporal inconsistency. The method introduces Wavelet-domain Masked Image Modeling (W-MIM) into HDR video reconstruction for the first time, employing a two-stage training strategy: the first stage leverages curriculum learning for self-supervised pretraining in the wavelet domain, while the second stage integrates a Temporal Mixture of Experts (T-MoE) and a Dynamic Memory Module (DMM) to enhance inter-frame consistency and preserve fine details. Additionally, the authors construct a new benchmark dataset, HDRTV4K-Scene. Experimental results demonstrate that WMNet significantly outperforms existing approaches in terms of color fidelity, temporal coherence, and perceptual quality.

Technology Category

Application Category

📝 Abstract
High Dynamic Range (HDR) video reconstruction aims to recover fine brightness, color, and details from Low Dynamic Range (LDR) videos. However, existing methods often suffer from color inaccuracies and temporal inconsistencies. To address these challenges, we propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM). WMNet adopts a two-phase training strategy: In Phase I, W-MIM performs self-reconstruction pre-training by selectively masking color and detail information in the wavelet domain, enabling the network to develop robust color restoration capabilities. A curriculum learning scheme further refines the reconstruction process. Phase II fine-tunes the model using the pre-trained weights to improve the final reconstruction quality. To improve temporal consistency, we introduce the Temporal Mixture of Experts (T-MoE) module and the Dynamic Memory Module (DMM). T-MoE adaptively fuses adjacent frames to reduce flickering artifacts, while DMM captures long-range dependencies, ensuring smooth motion and preservation of fine details. Additionally, since existing HDR video datasets lack scene-based segmentation, we reorganize HDRTV4K into HDRTV4K-Scene, establishing a new benchmark for HDR video reconstruction. Extensive experiments demonstrate that WMNet achieves state-of-the-art performance across multiple evaluation metrics, significantly improving color fidelity, temporal coherence, and perceptual quality. The code is available at: https://github.com/eezkni/WMNet
Problem

Research questions and friction points this paper is trying to address.

HDR video reconstruction
color inaccuracies
temporal inconsistencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet-domain Masked Image Modeling
Temporal Mixture of Experts
Dynamic Memory Module
HDR video reconstruction
Curriculum learning
🔎 Similar Papers
No similar papers found.
Y
Yang Zhang
School of Computer Science and Technology and the Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092, China
Z
Zhangkai Ni
School of Computer Science and Technology and the Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092, China
Wenhan Yang
Wenhan Yang
P.hD. student of Computer Science, University of California, Los Angeles
Self-supervised LearningModel Robustness
Hanli Wang
Hanli Wang
Tongji University
Multimedia ComputingComputer VisionImage ProcessingMachine Learning