🤖 AI Summary
This work addresses the challenge of strong geometry-albedo coupling and complex skin optical properties—particularly subsurface scattering—in sparse-view (only three images) 3D face reconstruction, which hinders effective disentanglement. To this end, we propose a two-stage differentiable decomposition framework. Methodologically: (i) we construct a generic face template and guide its personalized fine-tuning; (ii) we explicitly model joint geometry-albedo constraints and incorporate physics-based skin subsurface scattering; and (iii) we employ end-to-end, physically inspired neural rendering optimization. Compared to state-of-the-art approaches, our method achieves superior performance in geometric accuracy, separation fidelity between diffuse and specular reflectance components, and novel-view synthesis quality. To the best of our knowledge, it is the first method to achieve high-fidelity, fully disentangled 3D face reconstruction from sparse input views.
📝 Abstract
In this study, we introduce a novel two-stage technique for decomposing and reconstructing facial features from sparse-view images, a task made challenging by the unique geometry and complex skin reflectance of each individual. To synthesize 3D facial models more realistically, we endeavor to decouple key facial attributes from the RGB color, including geometry, diffuse reflectance, and specular reflectance. Specifically, we design a Sparse-view Face Decomposition Model (SFDM): 1) In the first stage, we create a general facial template from a wide array of individual faces, encapsulating essential geometric and reflectance characteristics. 2) Guided by this template, we refine a specific facial model for each individual in the second stage, considering the interaction between geometry and reflectance, as well as the effects of subsurface scattering on the skin. With these advances, our method can reconstruct high-quality facial representations from as few as three images. The comprehensive evaluation and comparison reveal that our approach outperforms existing methods by effectively disentangling geometric and reflectance components, significantly enhancing the quality of synthesized novel views, and paving the way for applications in facial relighting and reflectance editing.