🤖 AI Summary
In professional-domain few-shot scenarios, self-supervised continual pretraining faces critical bottlenecks—severe data scarcity, infeasibility of hyperparameter tuning, and absence of publicly available backbone models containing necessary information for further training.
Method: We propose DIET-CP, a lightweight, cross-modal and cross-architecture continual pretraining method requiring only a minimal amount (e.g., 1,000 samples) of unlabeled data, no additional hyperparameters, and solely the original backbone weights. Its core is an ultra-simple, unsupervised training objective designed to jointly ensure stability and efficiency.
Contribution/Results: Experiments demonstrate that DIET-CP significantly enhances the adaptability of state-of-the-art vision foundation models (e.g., DINOv3) under extremely limited data distributions, effectively overcoming feasibility and efficiency barriers in continual learning under data-scarce regimes.
📝 Abstract
Continued pretraining offers a promising solution for adapting foundation models to a new target domain. However, in specialized domains, available datasets are often very small, limiting the applicability of SSL methods developed for large-scale pretraining and making hyperparameter search infeasible. In addition, pretrained models are usually released as backbone-weights only, lacking important information to continue pretraining. We propose to bridge this gap with DIET-CP, a simple continued pretraining strategy, where any strong foundation model can be steered towards the new data distribution of interest. DIET-CP relies on a very simple objective, requires no labels, and introduces no more hyperparameters than supervised finetuning. It is stable across data modalities and backbone choices, while providing a significant performance boost for state-of-the-art models such as DINOv3 using only 1000 images.