🤖 AI Summary
To address the high labeling cost in scenarios where pre-trained models coexist with abundant unlabeled data, this paper proposes an online, context-aware active model selection framework. In each round, it dynamically selects the optimal pre-trained model based on input context for prediction and adaptively decides whether to query the true label. The method integrates context modeling, multi-armed bandit theory, active learning, and online optimization—introducing, for the first time, a context-driven model selection mechanism and a co-optimized active querying strategy. It establishes the first theoretical guarantees on regret and query complexity under both adversarial and stochastic environments. Empirical evaluation on benchmarks including CIFAR-10 and DRIFT demonstrates that the approach achieves comparable or superior accuracy using less than 10% of the labeled data required by state-of-the-art methods.
📝 Abstract
While training models and labeling data are resource-intensive, a wealth of pre-trained models and unlabeled data exists. To effectively utilize these resources, we present an approach to actively select pre-trained models while minimizing labeling costs. We frame this as an online contextual active model selection problem: At each round, the learner receives an unlabeled data point as a context. The objective is to adaptively select the best model to make a prediction while limiting label requests. To tackle this problem, we propose CAMS, a contextual active model selection algorithm that relies on two novel components: (1) a contextual model selection mechanism, which leverages context information to make informed decisions about which model is likely to perform best for a given context, and (2) an active query component, which strategically chooses when to request labels for data points, minimizing the overall labeling cost. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Furthermore, we demonstrate the effectiveness of our algorithm on a diverse collection of benchmark classification tasks. Notably, CAMS requires substantially less labeling effort (less than 10%) compared to existing methods on CIFAR10 and DRIFT benchmarks, while achieving similar or better accuracy. Our code is publicly available at: https://github.com/xuefeng-cs/Contextual-Active-Model-Selection.