🤖 AI Summary
NeRF and Gaussian Splatting suffer from degraded pixel-wise 3D coordinate estimation in Scene Coordinate Regression (SCR) due to rendering blur and ghosting artifacts. To address this, we propose a dynamic pixel reliability filtering mechanism based on differentiable reprojection loss and gradient-aware joint evaluation, enabling adaptive rejection of unreliable pixels. We further introduce sparse input supervision—previously unexplored in SCR—into the training paradigm to jointly optimize rendering fidelity and coordinate regression accuracy. Our method requires no additional annotations and is compatible with mainstream neural rendering frameworks. Evaluated on multiple indoor and outdoor benchmark datasets, it achieves state-of-the-art performance, significantly improving camera pose estimation accuracy and cross-scene generalization capability.
📝 Abstract
The task of estimating camera poses can be enhanced through novel view synthesis techniques such as NeRF and Gaussian Splatting to increase the diversity and extension of training data. However, these techniques often produce rendered images with issues like blurring and ghosting, which compromise their reliability. These issues become particularly pronounced for Scene Coordinate Regression (SCR) methods, which estimate 3D coordinates at the pixel level. To mitigate the problems associated with unreliable rendered images, we introduce a novel filtering approach, which selectively extracts well-rendered pixels while discarding the inferior ones. This filter simultaneously measures the SCR model's real-time reprojection loss and gradient during training. Building on this filtering technique, we also develop a new strategy to improve scene coordinate regression using sparse inputs, drawing on successful applications of sparse input techniques in novel view synthesis. Our experimental results validate the effectiveness of our method, demonstrating state-of-the-art performance on indoor and outdoor datasets.