RawGen: Learning Camera Raw Image Generation

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
Existing generative models struggle to synthesize physically consistent camera raw images, hindering progress in low-level vision tasks. This work proposes RawGen, the first diffusion framework capable of both text-to-raw image generation and sRGB-to-raw inverse mapping. RawGen leverages the generative priors of large-scale sRGB diffusion models and integrates multi-parameter ISP simulation, a conditional denoiser, and a dedicated decoder to jointly produce physically plausible linear raw images in both latent and pixel spaces. To overcome the limitation of fixed ISP assumptions, the authors construct a many-to-one inverse ISP dataset. Experiments demonstrate that RawGen significantly outperforms existing methods in raw reconstruction quality, and its synthetic data effectively enhances performance on downstream vision tasks.

Technology Category

Application Category

📝 Abstract
Cameras capture scene-referred linear raw images, which are processed by onboard image signal processors (ISPs) into display-referred 8-bit sRGB outputs. Although raw data is more faithful for low-level vision tasks, collecting large-scale raw datasets remains a major bottleneck, as existing datasets are limited and tied to specific camera hardware. Generative models offer a promising way to address this scarcity -- however, existing diffusion frameworks are designed to synthesize photo-finished sRGB images rather than physically consistent linear representations. This paper presents RawGen, to our knowledge the first diffusion-based framework enabling text-to-raw generation for arbitrary target cameras, alongside sRGB-to-raw inversion. RawGen leverages the generative priors of large-scale sRGB diffusion models to synthesize physically meaningful linear outputs, such as CIE XYZ or camera-specific raw representations, via specialized processing in latent and pixel spaces. To handle unknown and diverse ISP pipelines and photo-finishing effects in diffusion-model training data, we build a many-to-one inverse-ISP dataset where multiple sRGB renditions of the same scene generated using diverse ISP parameters are anchored to a common scene-referred target. Fine-tuning a conditional denoiser and specialized decoder on this dataset allows RawGen to obtain camera-centric linear reconstructions that effectively invert the rendering pipeline. We demonstrate RawGen's superior performance over traditional inverse-ISP methods that assume a fixed ISP. Furthermore, we show that augmenting training pipelines with RawGen's scalable, text-driven synthetic data can benefit downstream low-level vision tasks.
Problem

Research questions and friction points this paper is trying to address.

raw image generation
data scarcity
camera-specific raw data
inverse ISP
low-level vision tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
raw image generation
inverse ISP
text-to-raw
physically consistent synthesis
🔎 Similar Papers
No similar papers found.