RawGen: Learning Camera Raw Image Generation

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing generative models struggle to synthesize physically consistent camera raw images, hindering progress in low-level vision tasks. This work proposes RawGen, the first diffusion framework capable of both text-to-raw image generation and sRGB-to-raw inverse mapping. RawGen leverages the generative priors of large-scale sRGB diffusion models and integrates multi-parameter ISP simulation, a conditional denoiser, and a dedicated decoder to jointly produce physically plausible linear raw images in both latent and pixel spaces. To overcome the limitation of fixed ISP assumptions, the authors construct a many-to-one inverse ISP dataset. Experiments demonstrate that RawGen significantly outperforms existing methods in raw reconstruction quality, and its synthetic data effectively enhances performance on downstream vision tasks.

Technology Category

Application Category

📝 Abstract

Cameras capture scene-referred linear raw images, which are processed by onboard image signal processors (ISPs) into display-referred 8-bit sRGB outputs. Although raw data is more faithful for low-level vision tasks, collecting large-scale raw datasets remains a major bottleneck, as existing datasets are limited and tied to specific camera hardware. Generative models offer a promising way to address this scarcity -- however, existing diffusion frameworks are designed to synthesize photo-finished sRGB images rather than physically consistent linear representations. This paper presents RawGen, to our knowledge the first diffusion-based framework enabling text-to-raw generation for arbitrary target cameras, alongside sRGB-to-raw inversion. RawGen leverages the generative priors of large-scale sRGB diffusion models to synthesize physically meaningful linear outputs, such as CIE XYZ or camera-specific raw representations, via specialized processing in latent and pixel spaces. To handle unknown and diverse ISP pipelines and photo-finishing effects in diffusion-model training data, we build a many-to-one inverse-ISP dataset where multiple sRGB renditions of the same scene generated using diverse ISP parameters are anchored to a common scene-referred target. Fine-tuning a conditional denoiser and specialized decoder on this dataset allows RawGen to obtain camera-centric linear reconstructions that effectively invert the rendering pipeline. We demonstrate RawGen's superior performance over traditional inverse-ISP methods that assume a fixed ISP. Furthermore, we show that augmenting training pipelines with RawGen's scalable, text-driven synthetic data can benefit downstream low-level vision tasks.

Problem

Research questions and friction points this paper is trying to address.

raw image generation

data scarcity

camera-specific raw data

inverse ISP

low-level vision tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model

raw image generation

inverse ISP