🤖 AI Summary
To address low visual fidelity (e.g., wear, aging, weathering) and geometric inconsistency of physical materials across multi-view observations, this paper proposes a fine-tuning-free differentiable inverse rendering framework. Methodologically, it integrates pre-trained text-to-image diffusion models (e.g., Stable Diffusion) with multi-view differentiable rendering, leveraging UV-space-consistent noise initialization and projection-constrained attention to enforce cross-view geometric and appearance alignment. Coupled with text-conditioned guidance and joint backpropagation over PBR parameters, the framework directly optimizes physically consistent 2D texture maps—including albedo, normal, and roughness. This work is the first to seamlessly embed diffusion priors into the inverse rendering pipeline while preserving material physical differentiability and enabling artist-friendly interactive editing. As a result, it significantly reduces the cost of producing high-fidelity, physically based rendering (PBR) materials.
📝 Abstract
We present a tool for enhancing the detail of physically based materials using an off-the-shelf diffusion model and inverse rendering. Our goal is to enhance the visual fidelity of materials with detail that is often tedious to author, by adding signs of wear, aging, weathering, etc. As these appearance details are often rooted in real-world processes, we leverage a generative image model trained on a large dataset of natural images with corresponding visuals in context. Starting with a given geometry, UV mapping, and basic appearance, we render multiple views of the object. We use these views, together with an appearance-defining text prompt, to condition a diffusion model. The details it generates are then backpropagated from the enhanced images to the material parameters via inverse differentiable rendering. For inverse rendering to be successful, the generated appearance has to be consistent across all the images. We propose two priors to address the multi-view consistency of the diffusion model. First, we ensure that the initial noise that seeds the diffusion process is itself consistent across views by integrating it from a view-independent UV space. Second, we enforce geometric consistency by biasing the attention mechanism via a projective constraint so that pixels attend strongly to their corresponding pixel locations in other views. Our approach does not require any training or finetuning of the diffusion model, is agnostic of the material model used, and the enhanced material properties, i.e., 2D PBR textures, can be further edited by artists.