๐ค AI Summary
Existing inverse rendering methods based on pretrained diffusion models rely on noisy image inputs; however, structural and appearance information is often degraded during the denoising process, leading to unstable and low-quality intrinsic property predictions. This work addresses indoor scenes by proposing a deterministic, noise-free inverse rendering framework that abandons the conventional โnoising โ denoising โ predictionโ paradigm. Instead, it directly takes clean input images and employs flow matching for deterministic, end-to-end prediction of intrinsic properties (e.g., albedo, normal, depth). A generative rendering module is further introduced to enforce physical consistency, ensuring predicted attributes can be forward-rendered to reconstruct the original image faithfully. Evaluated on both synthetic and real-world indoor datasets, our method significantly outperforms state-of-the-art approaches, achieving notable improvements in geometric fidelity, texture quality, and robustness to domain shifts.
๐ Abstract
Recent methods have shown that pre-trained diffusion models can be fine-tuned to enable generative inverse rendering by learning image-conditioned noise-to-intrinsic mapping. Despite their remarkable progress, they struggle to robustly produce high-quality results as the noise-to-intrinsic paradigm essentially utilizes noisy images with deteriorated structure and appearance for intrinsic prediction, while it is common knowledge that structure and appearance information in an image are crucial for inverse rendering. To address this issue, we present DNF-Intrinsic, a robust yet efficient inverse rendering approach fine-tuned from a pre-trained diffusion model, where we propose to take the source image rather than Gaussian noise as input to directly predict deterministic intrinsic properties via flow matching. Moreover, we design a generative renderer to constrain that the predicted intrinsic properties are physically faithful to the source image. Experiments on both synthetic and real-world datasets show that our method clearly outperforms existing state-of-the-art methods.