HGACNet: Hierarchical Graph Attention Network for Cross-Modal Point Cloud Completion

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Point cloud completion suffers from severe geometric incompleteness due to self-occlusion and sensor limitations, degrading downstream robotic tasks such as grasping and obstacle avoidance. To address this, we propose a single-view RGB-guided cross-modal point cloud completion framework. Our method introduces a hierarchical graph attention encoder to jointly model local and global structural continuity in point clouds. An attention-driven multi-scale cross-modal fusion module enables fine-grained alignment between RGB image priors and geometric features. Additionally, a contrastive loss is employed to enhance semantic consistency across modalities. Evaluated on ShapeNet-ViPC and YCB-Complete benchmarks, our approach achieves state-of-the-art performance in both quantitative metrics and qualitative reconstruction fidelity. Crucially, it demonstrates superior generalization and reconstruction accuracy in real-world robotic manipulation scenarios, validating its practical applicability for embodied perception systems.

Technology Category

Application Category

📝 Abstract
Point cloud completion is essential for robotic perception, object reconstruction and supporting downstream tasks like grasp planning, obstacle avoidance, and manipulation. However, incomplete geometry caused by self-occlusion and sensor limitations can significantly degrade downstream reasoning and interaction. To address these challenges, we propose HGACNet, a novel framework that reconstructs complete point clouds of individual objects by hierarchically encoding 3D geometric features and fusing them with image-guided priors from a single-view RGB image. At the core of our approach, the Hierarchical Graph Attention (HGA) encoder adaptively selects critical local points through graph attention-based downsampling and progressively refines hierarchical geometric features to better capture structural continuity and spatial relationships. To strengthen cross-modal interaction, we further design a Multi-Scale Cross-Modal Fusion (MSCF) module that performs attention-based feature alignment between hierarchical geometric features and structured visual representations, enabling fine-grained semantic guidance for completion. In addition, we proposed the contrastive loss (C-Loss) to explicitly align the feature distributions across modalities, improving completion fidelity under modality discrepancy. Finally, extensive experiments conducted on both the ShapeNet-ViPC benchmark and the YCB-Complete dataset confirm the effectiveness of HGACNet, demonstrating state-of-the-art performance as well as strong applicability in real-world robotic manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing complete point clouds from incomplete geometry
Fusing 3D geometric features with image-guided priors
Addressing modality discrepancy through cross-modal feature alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Graph Attention encoder for geometric features
Multi-Scale Cross-Modal Fusion with attention alignment
Contrastive loss for cross-modal feature distribution alignment
🔎 Similar Papers
No similar papers found.