Linear Image Generation by Synthesizing Exposure Brackets

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing generative models struggle to synthesize linear images with full dynamic range, limiting their utility in professional post-processing workflows. This work proposes the first text-to-linear-image generation method, modeling linear images as exposure-bracketed sequences and leveraging a DiT architecture combined with flow matching to reconstruct high dynamic range content. The approach integrates VAE latent space modeling, ControlNet for structural conditioning, and a linear-domain post-processing pipeline, enabling both text-guided and structure-conditioned generation. Experimental results demonstrate that the generated images significantly outperform existing methods in visual fidelity and editing flexibility, effectively meeting the demands of professional-grade, high-fidelity image synthesis.

Technology Category

Application Category

📝 Abstract

The life of a photo begins with photons striking the sensor, whose signals are passed through a sophisticated image signal processing (ISP) pipeline to produce a display-referred image. However, such images are no longer faithful to the incident light, being compressed in dynamic range and stylized by subjective preferences. In contrast, RAW images record direct sensor signals before non-linear tone mapping. After camera response curve correction and demosaicing, they can be converted into linear images, which are scene-referred representations that directly reflect true irradiance and are invariant to sensor-specific factors. Since image sensors have better dynamic range and bit depth, linear images contain richer information than display-referred ones, leaving users more room for editing during post-processing. Despite this advantage, current generative models mainly synthesize display-referred images, which inherently limits downstream editing. In this paper, we address the task of text-to-linear-image generation: synthesizing a high-quality, scene-referred linear image that preserves full dynamic range, conditioned on a text prompt, for professional post-processing. Generating linear images is challenging, as pre-trained VAEs in latent diffusion models struggle to simultaneously preserve extreme highlights and shadows due to the higher dynamic range and bit depth. To this end, we represent a linear image as a sequence of exposure brackets, each capturing a specific portion of the dynamic range, and propose a DiT-based flow-matching architecture for text-conditioned exposure bracket generation. We further demonstrate downstream applications including text-guided linear image editing and structure-conditioned generation via ControlNet.

Problem

Research questions and friction points this paper is trying to address.

linear image generation

text-to-image synthesis

high dynamic range

scene-referred representation

exposure brackets

Innovation

Methods, ideas, or system contributions that make the work stand out.

linear image generation

exposure brackets

text-to-image synthesis