π€ AI Summary
This work addresses multi-view inverse rendering without photometric stereo cuesβi.e., jointly recovering object geometry, spatially varying reflectance, and unknown illumination solely from unaligned, arbitrarily lit multi-view images. We propose the first single-stage self-calibrating neural implicit framework: geometry and material are jointly represented via a signed distance field and reflectance latent codes; shadow-aware volumetric rendering and an angularly encoded conditional reflectance network enable coupled optimization of lighting, geometry, and reflectance. Our method requires no light source calibration, intermediate supervision, or photometric stereo assumptions, and handles complex geometries and materials. On both synthetic and real-world benchmarks, it achieves significant improvements in shape and illumination estimation accuracy, demonstrating strong generalization and robustness. This work establishes a new paradigm for unconstrained neural inverse rendering.
π Abstract
We propose a neural inverse rendering approach that jointly reconstructs geometry, spatially varying reflectance, and lighting conditions from multi-view images captured under varying directional lighting. Unlike prior multi-view photometric stereo methods that require light calibration or intermediate cues such as per-view normal maps, our method jointly optimizes all scene parameters from raw images in a single stage. We represent both geometry and reflectance as neural implicit fields and apply shadow-aware volume rendering. A spatial network first predicts the signed distance and a reflectance latent code for each scene point. A reflectance network then estimates reflectance values conditioned on the latent code and angularly encoded surface normal, view, and light directions. The proposed method outperforms state-of-the-art normal-guided approaches in shape and lighting estimation accuracy, generalizes to view-unaligned multi-light images, and handles objects with challenging geometry and reflectance.