🤖 AI Summary
This work addresses the limited generalization of existing point cloud completion methods to novel objects and real-world scenes. To this end, the authors propose a unified multimodal architecture that integrates point clouds, RGB images, and textual descriptions. Robustness is enhanced through a modality dropout mechanism, while a dedicated multimodal Transformer fusion module enables effective cross-modal alignment. A progressive generator is further introduced to refine geometric detail reconstruction. Concurrently, the authors construct MGPC-1M, the first million-scale multimodal point cloud completion dataset, enabling large-scale training and evaluation. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches on both MGPC-1M and real-scene benchmarks, achieving superior generalization capability and completion quality.
📝 Abstract
Point cloud completion aims to recover complete 3D geometry from partial observations caused by limited viewpoints and occlusions. Existing learning-based works, including 3D Convolutional Neural Network (CNN)-based, point-based, and Transformer-based methods, have achieved strong performance on synthetic benchmarks. However, due to the limitations of modality, scalability, and generative capacity, their generalization to novel objects and real-world scenarios remains challenging. In this paper, we propose MGPC, a generalizable multimodal point cloud completion framework that integrates point clouds, RGB images, and text within a unified architecture. MGPC introduces an innovative modality dropout strategy, a Transformer-based fusion module, and a novel progressive generator to improve robustness, scalability, and geometric modeling capability. We further develop an automatic data generation pipeline and construct MGPC-1M, a large-scale benchmark with over 1,000 categories and one million training pairs. Extensive experiments on MGPC-1M and in-the-wild data demonstrate that the proposed method consistently outperforms prior baselines and exhibits strong generalization under real-world conditions.