RawGen: Learning Camera Raw Image Generation

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative models struggle to synthesize physically consistent camera raw images, hindering progress in low-level vision tasks. This work proposes RawGen, the first diffusion framework capable of both text-to-raw image generation and sRGB-to-raw inverse mapping. RawGen leverages the generative priors of large-scale sRGB diffusion models and integrates multi-parameter ISP simulation, a conditional denoiser, and a dedicated decoder to jointly produce physically plausible linear raw images in both latent and pixel spaces. To overcome the limitation of fixed ISP assumptions, the authors construct a many-to-one inverse ISP dataset. Experiments demonstrate that RawGen significantly outperforms existing methods in raw reconstruction quality, and its synthetic data effectively enhances performance on downstream vision tasks.
📝 Abstract
Cameras capture scene-referred linear raw images, which are processed by onboard image signal processors (ISPs) into display-referred 8-bit sRGB outputs. Although raw data is more faithful for low-level vision tasks, collecting large-scale raw datasets remains a major bottleneck, as existing datasets are limited and tied to specific camera hardware. Generative models offer a promising way to address this scarcity -- however, existing diffusion frameworks are designed to synthesize photo-finished sRGB images rather than physically consistent linear representations. This paper presents RawGen, to our knowledge the first diffusion-based framework enabling text-to-raw generation for arbitrary target cameras, alongside sRGB-to-raw inversion. RawGen leverages the generative priors of large-scale sRGB diffusion models to synthesize physically meaningful linear outputs, such as CIE XYZ or camera-specific raw representations, via specialized processing in latent and pixel spaces. To handle unknown and diverse ISP pipelines and photo-finishing effects in diffusion-model training data, we build a many-to-one inverse-ISP dataset where multiple sRGB renditions of the same scene generated using diverse ISP parameters are anchored to a common scene-referred target. Fine-tuning a conditional denoiser and specialized decoder on this dataset allows RawGen to obtain camera-centric linear reconstructions that effectively invert the rendering pipeline. We demonstrate RawGen's superior performance over traditional inverse-ISP methods that assume a fixed ISP. Furthermore, we show that augmenting training pipelines with RawGen's scalable, text-driven synthetic data can benefit downstream low-level vision tasks.
Problem

Research questions and friction points this paper is trying to address.

raw image generation
data scarcity
camera-specific raw data
inverse ISP
low-level vision tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
raw image generation
inverse ISP
text-to-raw
physically consistent synthesis
🔎 Similar Papers
No similar papers found.