X2HDR: HDR Image Generation in a Perceptually Uniform Space

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing image generation models struggle to directly produce high dynamic range (HDR) images, primarily due to the scarcity of large-scale HDR datasets and the statistical discrepancies between sRGB and linear RGB domains. This work proposes mapping HDR images into perceptually uniform color spaces—such as PU21 or PQ—and leveraging a pre-trained low dynamic range (LDR) variational autoencoder (VAE) while applying only low-rank adaptation (LoRA) to the denoising network. This approach enables both text-to-HDR synthesis and single-image RAW-to-HDR reconstruction without full retraining. Notably, it demonstrates for the first time that perceptually uniform encoding effectively bridges the gap between LDR and HDR domains. By keeping the VAE frozen, the method unifies support for diverse HDR generation tasks and substantially improves perceptual fidelity, text alignment, and effective dynamic range over existing techniques.

Technology Category

Application Category

📝 Abstract

High-dynamic-range (HDR) formats and displays are becoming increasingly prevalent, yet state-of-the-art image generators (e.g., Stable Diffusion and FLUX) typically remain limited to low-dynamic-range (LDR) output due to the lack of large-scale HDR training data. In this work, we show that existing pretrained diffusion models can be easily adapted to HDR generation without retraining from scratch. A key challenge is that HDR images are natively represented in linear RGB, whose intensity and color statistics differ substantially from those of sRGB-encoded LDR images. This gap, however, can be effectively bridged by converting HDR inputs into perceptually uniform encodings (e.g., using PU21 or PQ). Empirically, we find that LDR-pretrained variational autoencoders (VAEs) reconstruct PU21-encoded HDR inputs with fidelity comparable to LDR data, whereas linear RGB inputs cause severe degradations. Motivated by this finding, we describe an efficient adaptation strategy that freezes the VAE and finetunes only the denoiser via low-rank adaptation in a perceptually uniform space. This results in a unified computational method that supports both text-to-HDR synthesis and single-image RAW-to-HDR reconstruction. Experiments demonstrate that our perceptually encoded adaptation consistently improves perceptual fidelity, text-image alignment, and effective dynamic range, relative to previous techniques.

Problem

Research questions and friction points this paper is trying to address.

HDR image generation

perceptually uniform space

diffusion models

dynamic range

image synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

perceptually uniform encoding

HDR image generation

low-rank adaptation