🤖 AI Summary
This work addresses the limited effectiveness of existing view poisoning methods in cross-system 3D reconstruction attacks, which often fail to exploit vulnerabilities in critical pipeline components. Focusing on the Structure-from-Motion (SfM) initialization stage, this study reveals— for the first time—its sensitivity to cross-view gradient inconsistency. By optimizing adversarial perturbations that disrupt keypoint detection and feature matching, the method effectively interferes with pose estimation and triangulation. The approach achieves transferable black-box attacks against diverse 3D reconstruction systems, including 3D Gaussian Splatting (3DGS) and NeRF, yielding a 25.1% improvement in PSNR and a 16.5% gain in SSIM under cross-system settings—significantly outperforming single-view baselines. Furthermore, the work establishes a theoretical link between cross-view inconsistency and correspondence collapse.
📝 Abstract
Poisoning input views of 3D reconstruction systems has been recently studied. However, we identify that existing studies simply backpropagate adversarial gradients through the 3D reconstruction pipeline as a whole, without uncovering the new vulnerability rooted in specific modules of the 3D reconstruction pipeline. In this paper, we argue that the structure-from-motion (SfM) initialization, as the geometric core of many widely used reconstruction systems, can be targeted to achieve transferable poisoning effects across diverse 3D reconstruction systems. To this end, we propose PoInit-of-View, which optimizes adversarial perturbations to intentionally introduce cross-view gradient inconsistencies at projections of corresponding 3D points. These inconsistencies disrupt keypoint detection and feature matching, thereby corrupting pose estimation and triangulation within SfM, eventually resulting in low-quality rendered views. We also provide a theoretical analysis that connects cross-view inconsistency to correspondence collapse. Experimental results demonstrate the effectiveness of our PoInit-of-View on diverse 3D reconstruction systems and datasets, surpassing the single-view baseline by 25.1% in PSNR and 16.5% in SSIM in black-box transfer settings, such as 3DGS to NeRF.