🤖 AI Summary
This work addresses a critical limitation in existing few-shot CLIP adaptation methods, which rely on the test set or an additional validation set to select fusion weights—violating the strict few-shot learning assumption. To resolve this, we propose Hold-One-Shot-Out (HOSO), a novel approach that reserves one training sample during adaptation to automatically learn the optimal fusion ratio for CLIP-Adapter without any external validation data. HOSO is the first method to enable adaptive fusion weight learning under a pure few-shot setting, effectively decoupling model training from ratio estimation through a single-sample hold-out mechanism. Evaluated on 11 standard few-shot benchmarks, HOSO achieves an average improvement of over 4 percentage points and outperforms baseline methods—even those tuned using test-set information—under both 8-shot and 16-shot configurations.
📝 Abstract
In many CLIP adaptation methods, a blending ratio hyperparameter controls the trade-off between general pretrained CLIP knowledge and the limited, dataset-specific supervision from the few-shot cases. Most few-shot CLIP adaptation techniques report results by ablation of the blending ratio on the test set or require additional validation sets to select the blending ratio per dataset, and thus are not strictly few-shot. We present a simple, validation-free method for learning the blending ratio in CLIP adaptation. Hold-One-Shot-Out (HOSO) presents a novel approach for CLIP-Adapter-style methods to compete in the newly established validation-free setting. CLIP-Adapter with HOSO (HOSO-Adapter) learns the blending ratio using a one-shot, hold-out set, while the adapter trains on the remaining few-shot support examples. Under the validation-free few-shot protocol, HOSO-Adapter outperforms the CLIP-Adapter baseline by more than 4 percentage points on average across 11 standard few-shot datasets. Interestingly, in the 8- and 16-shot settings, HOSO-Adapter outperforms CLIP-Adapter even with the optimal blending ratio selected on the test set. Ablation studies validate the use of a one-shot hold-out mechanism, decoupled training, and improvements over the naively learnt blending ratio baseline. Code is released here: https://github.com/chris-vorster/HOSO-Adapter