🤖 AI Summary
Multi-view structure-from-motion (SfM) in real-world scenarios often suffers from input point trajectories contaminated by a high proportion of outliers, yet existing deep learning approaches lack sufficient robustness to such corruptions. To address this, we propose the first equivariant SfM framework: (1) a matrix-equivariant inlier/outlier classification module that ensures geometrically consistent outlier detection; and (2) a coupled differentiable robust bundle adjustment, enabling end-to-end joint optimization of camera poses and 3D structure. Evaluated on realistic noisy trajectories extracted by SuperPoint+SuperGlue, our method significantly improves reconstruction accuracy—substantially outperforming state-of-the-art baselines. Notably, it achieves stable and reliable deployment on datasets comprising over one thousand images for the first time, establishing a new paradigm for large-scale, robust SfM.
📝 Abstract
Multiview Structure from Motion is a fundamental and challenging computer vision problem. A recent deep-based approach was proposed utilizing matrix equivariant architectures for the simultaneous recovery of camera pose and 3D scene structure from large image collections. This work however made the unrealistic assumption that the point tracks given as input are clean of outliers. Here we propose an architecture suited to dealing with outliers by adding an inlier/outlier classifying module that respects the model equivariance and by adding a robust bundle adjustment step. Experiments demonstrate that our method can be successfully applied in realistic settings that include large image collections and point tracks extracted with common heuristics and include many outliers.