A Highly Efficient Diversity-based Input Selection for DNN Improvement Using VLMs

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high annotation cost in fine-tuning deep neural networks and the computational inefficiency of existing diversity-based active learning methods. To overcome these limitations, the authors propose an efficient hybrid input selection strategy that leverages vision-language models (VLMs) to extract high-level semantic concepts and constructs a lightweight “conceptual batch diversity” (CBD) metric to approximate geometric diversity. CBD is combined with margin-based uncertainty for sample selection, substantially reducing computational complexity while maintaining strong acquisition performance. Extensive experiments demonstrate that CBD consistently outperforms five state-of-the-art baselines across various models, datasets, and labeling budgets, achieving competitive accuracy with computational efficiency comparable to simple uncertainty-based methods—making it well-suited for large-scale scenarios such as ImageNet.

Technology Category

Application Category

📝 Abstract
Maintaining or improving the performance of Deep Neural Networks (DNNs) through fine-tuning requires labeling newly collected inputs, a process that is often costly and time-consuming. To alleviate this problem, input selection approaches have been developed in recent years to identify small, yet highly informative subsets for labeling. Diversity-based selection is one of the most effective approaches for this purpose. However, they are often computationally intensive and lack scalability for large input sets, limiting their practical applicability. To address this challenge, we introduce Concept-Based Diversity (CBD), a highly efficient metric for image inputs that leverages Vision-Language Models (VLM). Our results show that CBD exhibits a strong correlation with Geometric Diversity (GD), an established diversity metric, while requiring only a fraction of its computation time. Building on this finding, we propose a hybrid input selection approach that combines CBD with Margin, a simple uncertainty metric. We conduct a comprehensive evaluation across a diverse set of DNN models, input sets, selection budgets, and five most effective state-of-the-art selection baselines. The results demonstrate that the CBD-based selection consistently outperforms all baselines at guiding input selection to improve the DNN model. Furthermore, the CBD-based selection approach remains highly efficient, requiring selection times close to those of simple uncertainty-based methods such as Margin, even on larger input sets like ImageNet. These results confirm not only the effectiveness and computational advantage of the CBD-based approach, particularly compared to hybrid baselines, but also its scalability in repetitive and extensive input selection scenarios.
Problem

Research questions and friction points this paper is trying to address.

input selection
diversity-based selection
deep neural networks
labeling cost
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-Based Diversity
Vision-Language Models
Input Selection
Active Learning
Efficient DNN Fine-tuning
🔎 Similar Papers
No similar papers found.