GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic

πŸ“… 2026-04-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

212K/year
πŸ€– AI Summary
This work addresses the β€œP2P Gap”—the disconnect between physically based rendering (PBR) and perceptually realistic rendering (PRR)β€”where explicit modeling is hindered by the scarcity of high-fidelity real-world digital assets, while implicit generative approaches struggle to maintain geometric consistency and user controllability. To bridge this gap, we propose GeRM, the first multimodal generative rendering model that unifies physical and perceptual realism. We formally define the P2P transfer problem, introduce the P2P-50K dataset comprising 50K samples, and present the novel concept of a Distribution Transfer Vector (DTV) Field. Leveraging a multi-condition ControlNet architecture conditioned on G-buffers, text prompts, and region-enhanced cues, GeRM enables continuous, controllable, and geometrically consistent image generation from PBR to PRR.

Technology Category

Application Category

πŸ“ Abstract
For decades, Physically-Based Rendering (PBR) is the fundation of synthesizing photorealisitic images, and therefore sometimes roughly referred as Photorealistic Rendering (PRR). While PBR is indeed a mathematical simulation of light transport that guarantees physical reality, photorealism has additional reliance on the realistic digital model of geometry and appearance of the real world, leaving a barely explored gap from PBR to PRR (P2P). Consequently, the path toward photorealism faces a critical dilemma: the explicit simulation of PRR encumbered by unreachable realistic digital models for real-world existence, while implicit generation models sacrifice controllability and geometric consistency. Based on this insight, this paper presents the problem, data, and approach of mitigating P2P gap, followed by the first multi-modal generative rendering model, dubbed GeRM, to unify PBR and PRR. GeRM integrates physical attributes like G-buffers with text prompts, and progressive incremental injection to generate controllable photorealistic images, allowing users to fluidly navigate the continuum between strict physical fidelity and perceptual photorealism. Technically, we model the transition between PBR and PRR images as a distribution transfer and aim to learn a distribution transfer vector field (DTV Field) to guide this process. To define the learning objective, we first leverage a multi-agent VLM framework to construct an expert-guided pairwise P2P transfer dataset, named P2P-50K, where each paired sample in the dataset corresponds to a transfer vector in the DTV Field. Subsequently, we propose a multi-condition ControlNet to learn the DTV Field, which synthesizes PBR images and progressively transitions them into PRR images, guided by G-buffers, text prompts, and cues for enhanced regions.
Problem

Research questions and friction points this paper is trying to address.

Physically-Based Rendering
Photorealistic Rendering
P2P gap
generative rendering
geometric consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Rendering
Physically-Based Rendering
Photorealistic Rendering
Distribution Transfer
ControlNet
πŸ”Ž Similar Papers
No similar papers found.
J
Jiayuan Lu
State Key Lab of CAD&CG, Zhejiang University and Zhejiang Lab, China
R
Rengan Xie
State Key Lab of CAD&CG, Zhejiang University and Zhejiang Lab, China
X
Xuancheng Jin
State Key Lab of CAD&CG, Zhejiang University and Zhejiang Lab, China
Z
Zhizhen Wu
State Key Lab of CAD&CG, Zhejiang University and Zhejiang Lab, China
Qi Ye
Qi Ye
Zhejiang University
Computer VisionMachine Learning
T
Tian Xie
Zhejiang University, China
H
Hujun Bao
State Key Lab of CAD&CG, Zhejiang University and Zhejiang Lab, China
Rui Wang
Rui Wang
China University of Geosciences
Lithium ion batteries
Y
Yuchi Huo
State Key Lab of CAD&CG, Zhejiang University and Zhejiang Lab, China