🤖 AI Summary
Monocular surface normal estimation suffers significant performance degradation in challenging scenarios such as reflective, textureless, or dark surfaces. This work proposes a training-free, plug-and-play framework that leverages a single polarimetric observation at test time to refine normal predictions through pixel-wise optimization. By employing differentiable rendering, the method aligns the output of a frozen, pre-trained RGB-based normal estimator with physically consistent polarization cues. It is the first approach to effectively integrate polarization-based physical priors into existing normal estimators without requiring multi-view inputs or specialized training data. Evaluated across seven benchmarks and three backbone architectures, the method reduces average angular error by 23–26% on synthetic data and 6–16% on real-world data.
📝 Abstract
Monocular surface normal estimators trained on large-scale RGB-normal data often perform poorly in the edge cases of reflective, textureless, and dark surfaces. Polarization encodes surface orientation independently of texture and albedo, offering a physics-based complement for these cases. Existing polarization methods, however, require multi-view capture or specialized training data, limiting generalization. We introduce Poppy, a training-free framework that refines normals from any frozen RGB backbone using single-shot polarization measurements at test time. Keeping backbone weights frozen, Poppy optimizes per-pixel offsets to the input RGB and output normal along with a learned reflectance decomposition. A differentiable rendering layer converts the refined normals into polarization predictions and penalizes mismatches with the observed signal. Across seven benchmarks and three backbone architectures (diffusion, flow, and feed-forward), Poppy reduces mean angular error by 23-26% on synthetic data and 6-16% on real data. These results show that guiding learned RGB-based normal estimators with polarization cues at test time refines normals on challenging surfaces without retraining.