🤖 AI Summary
Existing generative models struggle to efficiently synthesize high dynamic range (HDR) images, often relying on multiple generation passes of multi-exposure inputs, which incurs substantial computational overhead and structural inconsistencies. This work proposes a novel approach that decouples scene generation from exposure modeling in latent space: leveraging a pre-trained diffusion backbone to generate a coherent scene representation in a single forward pass, followed by a lightweight conditional mapping head that produces a structurally consistent stack of densely sampled HDR exposures. To the best of our knowledge, this is the first method capable of generating high-quality HDR sequences in a single inference step, dramatically improving both efficiency and consistency. Experiments demonstrate that the model achieves state-of-the-art performance in dynamic range and perceptual quality on benchmarks such as SI-HDR, while reducing computational cost by an order of magnitude.
📝 Abstract
High Dynamic Range (HDR) generation remains challenging for generative models, which are largely limited to low dynamic range outputs. Recent diffusionbased approaches approximate HDR by generating multiple exposure-conditioned samples, incurring high computational cost and structural inconsistencies across exposures. We propose LatentHDR, a framework that decouples scene generation from exposure modeling in latent space. A pretrained diffusion backbone produces a single coherent scene representation, while a lightweight conditional latent to-latent head deterministically maps it to exposure-specific representations. This enables the generation of a dense, structurally consistent exposure stack in a single pass. This design eliminates multi-pass diffusion, ensures cross-exposure alignment, and enables scalable HDR synthesis. LatentHDR supports both textand image-conditioned HDR generation for perspective and panoramic scenes. Experiments on synthetic data and the SI-HDR benchmark show that LatentHDR achieves state-of-the-art dynamic range with competitive perceptual quality, while reducing computation by an order of magnitude. Our results demonstrate that high-quality HDR generation can be achieved through structured latent modeling, challenging the need for stochastic multi-exposure generation.