Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud

📅 2024-04-25
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised point cloud learning methods suffer from high computational cost, reliance on input reconstruction, or dependence on auxiliary modalities. This paper proposes Joint Embedding Predictive Architecture (JEPA), a reconstruction-free, single-modality framework for efficient pretraining. Our approach introduces three key innovations: (1) the first learnable point cloud Sequencer module, which unifies geometric context and target modeling via a trainable patch ordering mechanism and shares proximity computation to enhance efficiency; (2) a geometry-aware strategy for constructing context–target pairs; and (3) contrastive representation learning for scalable, end-to-end pretraining. JEPA achieves state-of-the-art performance on ModelNet40 and ScanObjectNN benchmarks while significantly reducing pretraining time. Crucially, it eliminates both reconstruction constraints and cross-modal dependencies, enabling purely geometric, self-supervised learning from raw point clouds.

Technology Category

Application Category

📝 Abstract
Recent advancements in self-supervised learning in the point cloud domain have demonstrated significant potential. However, these methods often suffer from drawbacks, including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional modalities. In order to address these issues, we introduce Point-JEPA, a joint embedding predictive architecture designed specifically for point cloud data. To this end, we introduce a sequencer that orders point cloud patch embeddings to efficiently compute and utilize their proximity based on the indices during target and context selection. The sequencer also allows shared computations of the patch embeddings' proximity between context and target selection, further improving the efficiency. Experimentally, our method achieves competitive results with state-of-the-art methods while avoiding the reconstruction in the input space or additional modality.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised Learning
Point Cloud Data
Training Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Point-JEPA
Efficient Similarity Computation
Self-supervised Learning