Hold-One-Shot-Out (HOSO) for Validation-Free Few-Shot CLIP Adapters

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing few-shot CLIP adaptation methods, which rely on the test set or an additional validation set to select fusion weights—violating the strict few-shot learning assumption. To resolve this, we propose Hold-One-Shot-Out (HOSO), a novel approach that reserves one training sample during adaptation to automatically learn the optimal fusion ratio for CLIP-Adapter without any external validation data. HOSO is the first method to enable adaptive fusion weight learning under a pure few-shot setting, effectively decoupling model training from ratio estimation through a single-sample hold-out mechanism. Evaluated on 11 standard few-shot benchmarks, HOSO achieves an average improvement of over 4 percentage points and outperforms baseline methods—even those tuned using test-set information—under both 8-shot and 16-shot configurations.

Technology Category

Application Category

📝 Abstract
In many CLIP adaptation methods, a blending ratio hyperparameter controls the trade-off between general pretrained CLIP knowledge and the limited, dataset-specific supervision from the few-shot cases. Most few-shot CLIP adaptation techniques report results by ablation of the blending ratio on the test set or require additional validation sets to select the blending ratio per dataset, and thus are not strictly few-shot. We present a simple, validation-free method for learning the blending ratio in CLIP adaptation. Hold-One-Shot-Out (HOSO) presents a novel approach for CLIP-Adapter-style methods to compete in the newly established validation-free setting. CLIP-Adapter with HOSO (HOSO-Adapter) learns the blending ratio using a one-shot, hold-out set, while the adapter trains on the remaining few-shot support examples. Under the validation-free few-shot protocol, HOSO-Adapter outperforms the CLIP-Adapter baseline by more than 4 percentage points on average across 11 standard few-shot datasets. Interestingly, in the 8- and 16-shot settings, HOSO-Adapter outperforms CLIP-Adapter even with the optimal blending ratio selected on the test set. Ablation studies validate the use of a one-shot hold-out mechanism, decoupled training, and improvements over the naively learnt blending ratio baseline. Code is released here: https://github.com/chris-vorster/HOSO-Adapter
Problem

Research questions and friction points this paper is trying to address.

few-shot learning
CLIP adaptation
blending ratio
validation-free
hyperparameter selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hold-One-Shot-Out
validation-free
few-shot learning
CLIP adaptation
blending ratio
🔎 Similar Papers
No similar papers found.
C
Chris Vorster
ML-Labs, Dublin City University, Dublin, Ireland
Mayug Maniparambil
Mayug Maniparambil
IIT Madras
Computer VisionFoundation ModelsNLPMulti Modal Models
Noel E. O'Connor
Noel E. O'Connor
CEO, Insight Centre for Data Analytics, Dublin City University
Multimedia content analysisinformation retrievalmachine learningartificial intelligencecomputer vision
N
Noel Murphy
ML-Labs, Dublin City University, Dublin, Ireland
D
Derek Molloy
ML-Labs, Dublin City University, Dublin, Ireland