๐ค AI Summary
Existing rendering and inverse rendering methods suffer from low accuracy, high computational cost, and inherent ambiguities in the ill-posed inverse problem, while remaining largely decoupled in modeling. This paper introduces the first unified framework treating both tasks as bidirectional conditional generation within a shared latent space. We propose a dual-schedule diffusion architecture featuring a two-stream design, cycle-consistency constraints to mitigate inverse ambiguity, and a cross-stream modulation module for joint optimization across conditions. Leveraging a newly curated high-quality dataset of intrinsic attributeโimage pairs, our method significantly improves decomposition accuracy for material, lighting, and other attributes, achieving high-fidelity novel-view synthesis and robust inverse inference across diverse scenes. The code is publicly released to advance the field.
๐ Abstract
Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images. Despite achieving promising results of existing rendering methods, they merely approximate the ideal estimation for a specific scene and come with a high computational cost. Additionally, the inverse conditional distribution transfer is intractable due to the inherent ambiguity. To address these challenges, we propose a data-driven method that jointly models rendering and inverse rendering as two conditional generation tasks within a single diffusion framework. Inspired by UniDiffuser, we utilize two distinct time schedules to model both tasks, and with a tailored dual streaming module, we achieve cross-conditioning of two pre-trained diffusion models. This unified approach, named Uni-Renderer, allows the two processes to facilitate each other through a cycle-consistent constrain, mitigating ambiguity by enforcing consistency between intrinsic properties and rendered images. Combined with a meticulously prepared dataset, our method effectively decomposition of intrinsic properties and demonstrates a strong capability to recognize changes during rendering. We will open-source our training and inference code to the public, fostering further research and development in this area.