Linear Image Generation by Synthesizing Exposure Brackets

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
Existing generative models struggle to synthesize linear images with full dynamic range, limiting their utility in professional post-processing workflows. This work proposes the first text-to-linear-image generation method, modeling linear images as exposure-bracketed sequences and leveraging a DiT architecture combined with flow matching to reconstruct high dynamic range content. The approach integrates VAE latent space modeling, ControlNet for structural conditioning, and a linear-domain post-processing pipeline, enabling both text-guided and structure-conditioned generation. Experimental results demonstrate that the generated images significantly outperform existing methods in visual fidelity and editing flexibility, effectively meeting the demands of professional-grade, high-fidelity image synthesis.

Technology Category

Application Category

📝 Abstract
The life of a photo begins with photons striking the sensor, whose signals are passed through a sophisticated image signal processing (ISP) pipeline to produce a display-referred image. However, such images are no longer faithful to the incident light, being compressed in dynamic range and stylized by subjective preferences. In contrast, RAW images record direct sensor signals before non-linear tone mapping. After camera response curve correction and demosaicing, they can be converted into linear images, which are scene-referred representations that directly reflect true irradiance and are invariant to sensor-specific factors. Since image sensors have better dynamic range and bit depth, linear images contain richer information than display-referred ones, leaving users more room for editing during post-processing. Despite this advantage, current generative models mainly synthesize display-referred images, which inherently limits downstream editing. In this paper, we address the task of text-to-linear-image generation: synthesizing a high-quality, scene-referred linear image that preserves full dynamic range, conditioned on a text prompt, for professional post-processing. Generating linear images is challenging, as pre-trained VAEs in latent diffusion models struggle to simultaneously preserve extreme highlights and shadows due to the higher dynamic range and bit depth. To this end, we represent a linear image as a sequence of exposure brackets, each capturing a specific portion of the dynamic range, and propose a DiT-based flow-matching architecture for text-conditioned exposure bracket generation. We further demonstrate downstream applications including text-guided linear image editing and structure-conditioned generation via ControlNet.
Problem

Research questions and friction points this paper is trying to address.

linear image generation
text-to-image synthesis
high dynamic range
scene-referred representation
exposure brackets
Innovation

Methods, ideas, or system contributions that make the work stand out.

linear image generation
exposure brackets
text-to-image synthesis
flow matching
scene-referred representation