MGPC: Multimodal Network for Generalizable Point Cloud Completion With Modality Dropout and Progressive Decoding

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of existing point cloud completion methods to novel objects and real-world scenes. To this end, the authors propose a unified multimodal architecture that integrates point clouds, RGB images, and textual descriptions. Robustness is enhanced through a modality dropout mechanism, while a dedicated multimodal Transformer fusion module enables effective cross-modal alignment. A progressive generator is further introduced to refine geometric detail reconstruction. Concurrently, the authors construct MGPC-1M, the first million-scale multimodal point cloud completion dataset, enabling large-scale training and evaluation. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches on both MGPC-1M and real-scene benchmarks, achieving superior generalization capability and completion quality.

Technology Category

Application Category

📝 Abstract
Point cloud completion aims to recover complete 3D geometry from partial observations caused by limited viewpoints and occlusions. Existing learning-based works, including 3D Convolutional Neural Network (CNN)-based, point-based, and Transformer-based methods, have achieved strong performance on synthetic benchmarks. However, due to the limitations of modality, scalability, and generative capacity, their generalization to novel objects and real-world scenarios remains challenging. In this paper, we propose MGPC, a generalizable multimodal point cloud completion framework that integrates point clouds, RGB images, and text within a unified architecture. MGPC introduces an innovative modality dropout strategy, a Transformer-based fusion module, and a novel progressive generator to improve robustness, scalability, and geometric modeling capability. We further develop an automatic data generation pipeline and construct MGPC-1M, a large-scale benchmark with over 1,000 categories and one million training pairs. Extensive experiments on MGPC-1M and in-the-wild data demonstrate that the proposed method consistently outperforms prior baselines and exhibits strong generalization under real-world conditions.
Problem

Research questions and friction points this paper is trying to address.

point cloud completion
generalization
multimodal learning
real-world scenarios
novel objects
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal fusion
modality dropout
progressive decoding
point cloud completion
generalizable 3D generation
🔎 Similar Papers
No similar papers found.
J
Jiangyuan Liu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100043, China, and also with the State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation of Chinese Academy of Sciences, Beijing 100080, China
H
Hongxuan Ma
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation of Chinese Academy of Sciences, Beijing 100080, China
Y
Yuhao Zhao
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation of Chinese Academy of Sciences, Beijing 100080, China, and with the Chemical Defense Institute, Academy of Military Sciences, Beijing 102205, China
Z
Zhe Liu
Chemical Defense Institute, Academy of Military Sciences, Beijing 102205, China
Jian Wang
Jian Wang
Institute of Automation, Chinese Academy of Sciences
Bio-inspired roboticsIntelligent controlMechatronics
Wei Zou
Wei Zou
PKU、Samsung、Baidu、Didi、Ke
SpeechNLPLLMMultimodal