🤖 AI Summary
Existing photometric stereo methods suffer from insufficient multi-stage feature extraction and cross-stage interaction, leading to feature redundancy in complex regions (e.g., wrinkles and object boundaries) and limited normal estimation accuracy. To address this, we propose a multi-stage feature extraction and selective fusion framework: (1) a hierarchical feature extraction network that progressively enhances geometric detail representation; (2) a selective update mechanism to suppress redundant feature responses; and (3) a cross-stage feature fusion module enabling adaptive interaction between deep semantic and shallow-textural features. Evaluated on the DiLiGenT benchmark, our method achieves significant improvements over state-of-the-art approaches, reducing mean angular error (MAE) by 12.3%. Notably, gains are most pronounced in high-curvature regions. This work establishes a more robust and fine-grained feature modeling paradigm for learning-based photometric stereo.
📝 Abstract
Photometric stereo is a technique aimed at determining surface normals through the utilization of shading cues derived from images taken under different lighting conditions. However, existing learning-based approaches often fail to accurately capture features at multiple stages and do not adequately promote interaction between these features. Consequently, these models tend to extract redundant features, especially in areas with intricate details such as wrinkles and edges. To tackle these issues, we propose MSF-Net, a novel framework for extracting information at multiple stages, paired with selective update strategy, aiming to extract high-quality feature information, which is critical for accurate normal construction. Additionally, we have developed a feature fusion module to improve the interplay among different features. Experimental results on the DiLiGenT benchmark show that our proposed MSF-Net significantly surpasses previous state-of-the-art methods in the accuracy of surface normal estimation.