🤖 AI Summary
This work investigates whether the image modality is fundamentally necessary for single-view image-guided point cloud completion (SVIPC). Addressing the limitation of existing methods—over-reliance on precisely aligned, viewpoint-specific images—we propose the first fully view-free SVIPC paradigm. Our method takes only a local point cloud as input and employs a viewpoint-agnostic point cloud encoder. To robustly capture geometric structure, we introduce a hierarchical self-fusion mechanism that synergistically integrates multi-stream geometric features via cross-attention and self-attention. Furthermore, we design an attention-driven multi-branch encoder-decoder network that enables cross-modal-agnostic self-attention feature fusion. Evaluated on ShapeNet-ViPC, our approach significantly outperforms all prior SVIPC methods. Empirical results demonstrate that the single-view image is not essential for effective point cloud completion, challenging conventional assumptions in multimodal representation learning and offering both novel theoretical insight and a practical technical pathway toward modality-robust 3D reconstruction.
📝 Abstract
The single-view image guided point cloud completion (SVIPC) task aims to reconstruct a complete point cloud from a partial input with the help of a single-view image. While previous works have demonstrated the effectiveness of this multimodal approach, the fundamental necessity of image guidance remains largely unexamined. To explore this, we propose a strong baseline approach for SVIPC based on an attention-based multi-branch encoder-decoder network that only takes partial point clouds as input, view-free. Our hierarchical self-fusion mechanism, driven by cross-attention and self-attention layers, effectively integrates information across multiple streams, enriching feature representations and strengthening the networks ability to capture geometric structures. Extensive experiments and ablation studies on the ShapeNet-ViPC dataset demonstrate that our view-free framework performs superiorly to state-of-the-art SVIPC methods. We hope our findings provide new insights into the development of multimodal learning in SVIPC. Our demo code will be available at https://github.com/Zhang-VISLab.