🤖 AI Summary
This work addresses the inherent ambiguity in disentangling reflectance, texture, and illumination from a single image. The authors propose a multi-object generative inverse rendering method that leverages the prior that all objects in a scene share a common illumination. By employing a diffusion model, the approach jointly recovers per-object reflectance and texture while estimating the shared lighting. Key innovations include a cascaded end-to-end architecture, a coordination-guidance mechanism, an axial attention module, and a texture-extraction ControlNet, which together enable joint disentanglement in both image and angular spaces. This design effectively enforces illumination consistency across multiple objects while preserving high-frequency details. Experiments demonstrate that, given known geometry and a single input image containing multiple objects, the method significantly improves the accuracy and visual quality of material and lighting decomposition.
📝 Abstract
We introduce Multi-Object Generative Perception (MultiGP), a generative inverse rendering method for stochastic sampling of all radiometric constituents -- reflectance, texture, and illumination -- underlying object appearance from a single image. Our key idea to solve this inherently ambiguous radiometric disentanglement is to leverage the fact that while their texture and reflectance may differ, objects in the same scene are all lit by the same illumination. MultiGP exploits this consensus to produce samples of reflectance, texture, and illumination from a single image of known shapes based on four key technical contributions: a cascaded end-to-end architecture that combines image-space and angular-space disentanglement; Coordinated Guidance for diffusion convergence to a single consistent illumination estimate; Axial Attention applied to facilitate ``cross-talk'' between objects of different reflectance; and a Texture Extraction ControlNet to preserve high-frequency texture details while ensuring decoupling from estimated lighting. Experimental results demonstrate that MultiGP effectively leverages the complementary spatial and frequency characteristics of multiple object appearances to recover individual texture and reflectance as well as the common illumination.