Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work proposes WMNet, a novel deep learning framework for high dynamic range (HDR) video reconstruction from low dynamic range (LDR) inputs, specifically addressing color distortion and temporal inconsistency. The method introduces Wavelet-domain Masked Image Modeling (W-MIM) into HDR video reconstruction for the first time, employing a two-stage training strategy: the first stage leverages curriculum learning for self-supervised pretraining in the wavelet domain, while the second stage integrates a Temporal Mixture of Experts (T-MoE) and a Dynamic Memory Module (DMM) to enhance inter-frame consistency and preserve fine details. Additionally, the authors construct a new benchmark dataset, HDRTV4K-Scene. Experimental results demonstrate that WMNet significantly outperforms existing approaches in terms of color fidelity, temporal coherence, and perceptual quality.

Technology Category

Application Category

📝 Abstract

High Dynamic Range (HDR) video reconstruction aims to recover fine brightness, color, and details from Low Dynamic Range (LDR) videos. However, existing methods often suffer from color inaccuracies and temporal inconsistencies. To address these challenges, we propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM). WMNet adopts a two-phase training strategy: In Phase I, W-MIM performs self-reconstruction pre-training by selectively masking color and detail information in the wavelet domain, enabling the network to develop robust color restoration capabilities. A curriculum learning scheme further refines the reconstruction process. Phase II fine-tunes the model using the pre-trained weights to improve the final reconstruction quality. To improve temporal consistency, we introduce the Temporal Mixture of Experts (T-MoE) module and the Dynamic Memory Module (DMM). T-MoE adaptively fuses adjacent frames to reduce flickering artifacts, while DMM captures long-range dependencies, ensuring smooth motion and preservation of fine details. Additionally, since existing HDR video datasets lack scene-based segmentation, we reorganize HDRTV4K into HDRTV4K-Scene, establishing a new benchmark for HDR video reconstruction. Extensive experiments demonstrate that WMNet achieves state-of-the-art performance across multiple evaluation metrics, significantly improving color fidelity, temporal coherence, and perceptual quality. The code is available at: https://github.com/eezkni/WMNet

Problem

Research questions and friction points this paper is trying to address.

HDR video reconstruction

color inaccuracies

temporal inconsistencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet-domain Masked Image Modeling

Temporal Mixture of Experts

Dynamic Memory Module

HDR video reconstruction

Curriculum learning