🤖 AI Summary
This work addresses the limited effectiveness of linear color space modeling in RAW/HDR image restoration (denoising, deblurring, super-resolution). We systematically validate and propose a novel training paradigm that replaces conventional linear color spaces with display-encoded gamuts—specifically PQ, PU21, and mu-law—to better align neural network optimization with human visual perception. Our key contribution is the first empirical demonstration that perceptually uniform display encodings significantly accelerate model convergence and improve reconstruction quality. By integrating standard CNN architectures with perceptually calibrated loss functions, we achieve end-to-end optimization. Across denoising, deblurring, and super-resolution tasks, our approach yields consistent PSNR gains of 2–9 dB. Crucially, it preserves physical interpretability of RAW/HDR data while better satisfying both perceptual fidelity and deep learning optimization requirements. This establishes a generalizable, perception-aware training framework for high-dynamic-range image restoration.
📝 Abstract
The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearly related to colorimetric quantities of light. While training on commonly available display-encoded images is a well-established practice, there is no consensus on how neural networks should be trained for tasks on RAW and HDR images in linear color spaces. In this work, we test several approaches on three popular image restoration applications: denoising, deblurring, and single-image super-resolution. We examine whether HDR/RAW images need to be display-encoded using popular transfer functions (PQ, PU21, and mu-law), or whether it is better to train in linear color spaces, but use loss functions that correct for perceptual non-uniformity. Our results indicate that neural networks train significantly better on HDR and RAW images represented in display-encoded color spaces, which offer better perceptual uniformity than linear spaces. This small change to the training strategy can bring a very substantial gain in performance, between 2 and 9 dB.