🤖 AI Summary
This work addresses the lack of explicit control over physically-based shading and material properties in existing diffusion models, as well as the limited prompt-driven flexibility of traditional physically-based rendering (PBR). To bridge this gap, the authors propose a unified stochastic differential equation (SDE) framework that establishes, for the first time, a theoretical connection between PBR and diffusion models. By leveraging the central limit theorem, they derive a general SDE formulation suitable for Monte Carlo path tracing, revealing an intrinsic relationship between noise variance and physical rendering attributes. Integrating path tracing directly into the diffusion process enables a physically consistent evolution from noise to image, significantly enhancing the physical controllability, fidelity, and generalization of generated results in both rendering and material editing tasks.
📝 Abstract
Diffusion-based image generators excel at producing realistic content from text or image conditions, but they offer only limited explicit control over low-level, physically grounded shading and material properties. In contrast, physically based rendering (PBR) offers fine-grained physical control but lacks prompt-driven flexibility. Although these two paradigms originate from distinct communities, both share a common evolution -- from noisy observations to clean images. In this paper, we propose a unified stochastic formulation that bridges Monte Carlo rendering and diffusion-based generative modeling. First, a general stochastic differential equation (SDE) formulation for Monte Carlo integration under the Central Limit Theorem is modeled. Through instantiation via physically based path tracing, we convert it into a physically grounded SDE representation. Moreover, we provide a systematic analysis of how the physical characteristics of path tracing can be extended to existing diffusion models from the perspective of noise variance. Extensive experiments across multiple tasks show that our method can exert physically grounded control over diffusion-generated results, covering tasks such as rendering and material editing.