A Highly Efficient Diversity-based Input Selection for DNN Improvement Using VLMs

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the high annotation cost in fine-tuning deep neural networks and the computational inefficiency of existing diversity-based active learning methods. To overcome these limitations, the authors propose an efficient hybrid input selection strategy that leverages vision-language models (VLMs) to extract high-level semantic concepts and constructs a lightweight “conceptual batch diversity” (CBD) metric to approximate geometric diversity. CBD is combined with margin-based uncertainty for sample selection, substantially reducing computational complexity while maintaining strong acquisition performance. Extensive experiments demonstrate that CBD consistently outperforms five state-of-the-art baselines across various models, datasets, and labeling budgets, achieving competitive accuracy with computational efficiency comparable to simple uncertainty-based methods—making it well-suited for large-scale scenarios such as ImageNet.

Technology Category

Application Category

📝 Abstract

Maintaining or improving the performance of Deep Neural Networks (DNNs) through fine-tuning requires labeling newly collected inputs, a process that is often costly and time-consuming. To alleviate this problem, input selection approaches have been developed in recent years to identify small, yet highly informative subsets for labeling. Diversity-based selection is one of the most effective approaches for this purpose. However, they are often computationally intensive and lack scalability for large input sets, limiting their practical applicability. To address this challenge, we introduce Concept-Based Diversity (CBD), a highly efficient metric for image inputs that leverages Vision-Language Models (VLM). Our results show that CBD exhibits a strong correlation with Geometric Diversity (GD), an established diversity metric, while requiring only a fraction of its computation time. Building on this finding, we propose a hybrid input selection approach that combines CBD with Margin, a simple uncertainty metric. We conduct a comprehensive evaluation across a diverse set of DNN models, input sets, selection budgets, and five most effective state-of-the-art selection baselines. The results demonstrate that the CBD-based selection consistently outperforms all baselines at guiding input selection to improve the DNN model. Furthermore, the CBD-based selection approach remains highly efficient, requiring selection times close to those of simple uncertainty-based methods such as Margin, even on larger input sets like ImageNet. These results confirm not only the effectiveness and computational advantage of the CBD-based approach, particularly compared to hybrid baselines, but also its scalability in repetitive and extensive input selection scenarios.

Problem

Research questions and friction points this paper is trying to address.

input selection

diversity-based selection

deep neural networks

labeling cost

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-Based Diversity

Vision-Language Models

Input Selection