🤖 AI Summary
The SDRTV-to-HDRTV conversion task suffers from ill-posed mapping, poor generalization, and insufficient perceptual realism due to limited input information and diverse target HDR styles. To address these challenges, this paper proposes a two-stage conversion framework guided by real-world HDRTV priors. It is the first work to introduce authentic HDRTV content as a reference prior for this task, transforming reference-free prediction into reference-guided selection. Specifically, we employ a VQ-GAN to model the HDR prior distribution and design a content-matching mechanism to enable SDR-driven retrieval and reconstruction from the learned prior. Evaluated on both real and synthetic benchmarks, our method achieves state-of-the-art performance across quantitative metrics—including PSNR, SSIM, and LPIPS—as well as in human subjective assessments. It significantly enhances the perceptual fidelity and stylistic robustness of converted HDR videos.
📝 Abstract
The rise of HDR-WCG display devices has highlighted the need to convert SDRTV to HDRTV, as most video sources are still in SDR. Existing methods primarily focus on designing neural networks to learn a single-style mapping from SDRTV to HDRTV. However, the limited information in SDRTV and the diversity of styles in real-world conversions render this process an ill-posed problem, thereby constraining the performance and generalization of these methods. Inspired by generative approaches, we propose a novel method for SDRTV to HDRTV conversion guided by real HDRTV priors. Despite the limited information in SDRTV, introducing real HDRTV as reference priors significantly constrains the solution space of the originally high-dimensional ill-posed problem. This shift transforms the task from solving an unreferenced prediction problem to making a referenced selection, thereby markedly enhancing the accuracy and reliability of the conversion process. Specifically, our approach comprises two stages: the first stage employs a Vector Quantized Generative Adversarial Network to capture HDRTV priors, while the second stage matches these priors to the input SDRTV content to recover realistic HDRTV outputs. We evaluate our method on public datasets, demonstrating its effectiveness with significant improvements in both objective and subjective metrics across real and synthetic datasets.