An Adapter-free Fine-tuning Approach for Tuning 3D Foundation Models

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of fine-tuning 3D foundation models in low-data regimes, where existing methods often suffer from overfitting, degradation of pretrained representations, or reduced inference efficiency due to added modules. To overcome these limitations, we propose Momentum-Consistent Fine-Tuning (MCFT), a novel adapter-free efficient fine-tuning paradigm. MCFT selectively updates a subset of encoder parameters while enforcing momentum consistency constraints, thereby preserving generic representations and maintaining the original inference speed without introducing any trainable components. Integrated with semi-supervised learning and structured pruning, MCFT achieves performance gains of 3.30% to 6.13% under 5-shot settings, significantly outperforming current approaches while balancing accuracy and deployment efficiency—making it particularly suitable for resource-constrained scenarios.

Technology Category

Application Category

📝 Abstract
Point cloud foundation models demonstrate strong generalization, yet adapting them to downstream tasks remains challenging in low-data regimes. Full fine-tuning often leads to overfitting and significant drift from pre-trained representations, while existing parameter-efficient fine-tuning (PEFT) methods mitigate this issue by introducing additional trainable components at the cost of increased inference-time latency. We propose Momentum-Consistency Fine-Tuning (MCFT), an adapter-free approach that bridges the gap between full and parameter-efficient fine-tuning. MCFT selectively fine-tunes a portion of the pre-trained encoder while enforcing a momentum-based consistency constraint to preserve task-agnostic representations. Unlike PEFT methods, MCFT introduces no additional representation learning parameters beyond a standard task head, maintaining the original model's parameter count and inference efficiency. We further extend MCFT with two variants: a semi-supervised framework that leverages abundant unlabeled data to enhance few-shot performance, and a pruning-based variant that improves computational efficiency through structured layer removal. Extensive experiments on object recognition and part segmentation benchmarks demonstrate that MCFT consistently outperforms prior methods, achieving a 3.30% gain in 5-shot settings and up to a 6.13% improvement with semi-supervised learning, while remaining well-suited for resource-constrained deployment.
Problem

Research questions and friction points this paper is trying to address.

3D foundation models
parameter-efficient fine-tuning
low-data regimes
overfitting
inference latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapter-free fine-tuning
Momentum-consistency constraint
Parameter-efficient tuning
3D foundation models
Semi-supervised learning
🔎 Similar Papers
No similar papers found.