🤖 AI Summary
This work addresses the challenges of fine-tuning 3D foundation models in low-data regimes, where existing methods often suffer from overfitting, degradation of pretrained representations, or reduced inference efficiency due to added modules. To overcome these limitations, we propose Momentum-Consistent Fine-Tuning (MCFT), a novel adapter-free efficient fine-tuning paradigm. MCFT selectively updates a subset of encoder parameters while enforcing momentum consistency constraints, thereby preserving generic representations and maintaining the original inference speed without introducing any trainable components. Integrated with semi-supervised learning and structured pruning, MCFT achieves performance gains of 3.30% to 6.13% under 5-shot settings, significantly outperforming current approaches while balancing accuracy and deployment efficiency—making it particularly suitable for resource-constrained scenarios.
📝 Abstract
Point cloud foundation models demonstrate strong generalization, yet adapting them to downstream tasks remains challenging in low-data regimes. Full fine-tuning often leads to overfitting and significant drift from pre-trained representations, while existing parameter-efficient fine-tuning (PEFT) methods mitigate this issue by introducing additional trainable components at the cost of increased inference-time latency. We propose Momentum-Consistency Fine-Tuning (MCFT), an adapter-free approach that bridges the gap between full and parameter-efficient fine-tuning. MCFT selectively fine-tunes a portion of the pre-trained encoder while enforcing a momentum-based consistency constraint to preserve task-agnostic representations. Unlike PEFT methods, MCFT introduces no additional representation learning parameters beyond a standard task head, maintaining the original model's parameter count and inference efficiency. We further extend MCFT with two variants: a semi-supervised framework that leverages abundant unlabeled data to enhance few-shot performance, and a pruning-based variant that improves computational efficiency through structured layer removal. Extensive experiments on object recognition and part segmentation benchmarks demonstrate that MCFT consistently outperforms prior methods, achieving a 3.30% gain in 5-shot settings and up to a 6.13% improvement with semi-supervised learning, while remaining well-suited for resource-constrained deployment.