DIET-CP: Lightweight and Data Efficient Self Supervised Continued Pretraining

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In professional-domain few-shot scenarios, self-supervised continual pretraining faces critical bottlenecks—severe data scarcity, infeasibility of hyperparameter tuning, and absence of publicly available backbone models containing necessary information for further training. Method: We propose DIET-CP, a lightweight, cross-modal and cross-architecture continual pretraining method requiring only a minimal amount (e.g., 1,000 samples) of unlabeled data, no additional hyperparameters, and solely the original backbone weights. Its core is an ultra-simple, unsupervised training objective designed to jointly ensure stability and efficiency. Contribution/Results: Experiments demonstrate that DIET-CP significantly enhances the adaptability of state-of-the-art vision foundation models (e.g., DINOv3) under extremely limited data distributions, effectively overcoming feasibility and efficiency barriers in continual learning under data-scarce regimes.

Technology Category

Application Category

📝 Abstract
Continued pretraining offers a promising solution for adapting foundation models to a new target domain. However, in specialized domains, available datasets are often very small, limiting the applicability of SSL methods developed for large-scale pretraining and making hyperparameter search infeasible. In addition, pretrained models are usually released as backbone-weights only, lacking important information to continue pretraining. We propose to bridge this gap with DIET-CP, a simple continued pretraining strategy, where any strong foundation model can be steered towards the new data distribution of interest. DIET-CP relies on a very simple objective, requires no labels, and introduces no more hyperparameters than supervised finetuning. It is stable across data modalities and backbone choices, while providing a significant performance boost for state-of-the-art models such as DINOv3 using only 1000 images.
Problem

Research questions and friction points this paper is trying to address.

Adapting foundation models to small target domains
Overcoming limited data in specialized domain pretraining
Eliminating hyperparameter search for continued pretraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight continued pretraining strategy
No labels or extra hyperparameters required
Stable across data modalities and backbones
🔎 Similar Papers