Positional Prompt Tuning for Efficient 3D Representation Learning

📅 2024-08-21

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 2

career value

179K/year

🤖 AI Summary

Existing point cloud representation learning methods overlook the role of positional encoding (PE), while prevailing parameter-efficient fine-tuning (PEFT) approaches struggle to jointly optimize geometric fidelity and parameter efficiency. To address this, we propose Positional Prompt Tuning (PPT), a lightweight PEFT framework. PPT is the first to formulate high-dimensional positional encodings as learnable prompt embeddings, co-designed with multi-scale patch encodings to construct a feature abstraction module that jointly captures local geometry and global structure. Furthermore, it enables dynamic adapter coupling for PE adaptation within the PEFT paradigm. With only 1.05% trainable parameters, PPT achieves 95.01% classification accuracy on ScanObjectNN OBJ_BG—surpassing state-of-the-art prompt- and adapter-based methods. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Point cloud analysis has achieved significant development and is well-performed in multiple downstream tasks like point cloud classification and segmentation, etc. Being conscious of the simplicity of the position encoding structure in Transformer-based architectures, we attach importance to the position encoding as a high-dimensional part and the patch encoder to offer multi-scale information. Together with the sequential Transformer, the whole module with position encoding comprehensively constructs a multi-scale feature abstraction module that considers both the local parts from the patch and the global parts from center points as position encoding. With only a few parameters, the position embedding module fits the setting of PEFT (Parameter-Efficient Fine-Tuning) tasks pretty well. Thus we unfreeze these parameters as a fine-tuning part. At the same time, we review the existing prompt and adapter tuning methods, proposing a fresh way of prompts and synthesizing them with adapters as dynamic adjustments. Our Proposed method of PEFT tasks, namely PPT, with only 1.05% of parameters for training, gets state-of-the-art results in several mainstream datasets, such as 95.01% accuracy in the ScanObjectNN OBJ_BG dataset. Codes will be released at https://github.com/zsc000722/PPT.

Problem

Research questions and friction points this paper is trying to address.

Improving 3D representation learning efficiency through positional encoding

Developing parameter-efficient fine-tuning for point cloud analysis

Achieving state-of-the-art accuracy with minimal trainable parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient fine-tuning with positional prompts

Increased patch tokens with trainable positional encoding

Frozen pre-trained parameters for efficient point cloud analysis

🔎 Similar Papers

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm