🤖 AI Summary
Selecting pre-trained vision backbones for few-shot image classification is dataset-dependent and lacks generalizability. Method: This paper proposes an efficient, lightweight, task-oriented backbone selection method that replaces costly exhaustive search or generic benchmark recommendations with a fast, GPU-efficient heuristic evaluation—completed within approximately one hour. It integrates task-aware backbone scoring and ranking to shift the selection paradigm from “generic-benchmark-driven” to “task-performance-driven,” drastically reducing computational overhead. Contribution/Results: Experiments on four standard vision benchmarks demonstrate that the selected backbones consistently achieve higher classification accuracy than those recommended by generic benchmarks, validating both effectiveness and practicality.
📝 Abstract
This work tackles the challenge of efficiently selecting high-performance pre-trained vision backbones for specific target tasks. Although exhaustive search within a finite set of backbones can solve this problem, it becomes impractical for large datasets and backbone pools. To address this, we introduce Vision Backbone Efficient Selection (VIBES), which aims to quickly find well-suited backbones, potentially trading off optimality for efficiency. We propose several simple yet effective heuristics to address VIBES and evaluate them across four diverse computer vision datasets. Our results show that these approaches can identify backbones that outperform those selected from generic benchmarks, even within a limited search budget of one hour on a single GPU. We reckon VIBES marks a paradigm shift from benchmarks to task-specific optimization.