Enabling Small Models for Zero-Shot Selection and Reuse through Model Label Learning

📅 2024-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Expert models struggle with zero-shot adaptation to novel tasks while simultaneously maintaining high performance and broad generalization. Method: This paper proposes Model Label Learning (MLL), a new paradigm that constructs a Semantic Directed Acyclic Graph (SDAG) to explicitly model functional semantics of models, enabling interpretable alignment between model capabilities and task requirements; it further introduces the Classification Head Combination Optimization (CHCO) algorithm for zero-shot, fine-tuning-free model selection. The resulting model hub enables plug-and-play reuse of expert models via label-driven orchestration. Contribution/Results: Experiments across seven real-world datasets demonstrate significant gains in zero-shot accuracy—especially for small models—and show consistent improvement in zero-shot performance as the hub scales. For the first time, MLL unifies strong generalization, scalability, and retention of expert-model accuracy.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot ability in image classification tasks by aligning text and images but suffer inferior performance compared with task-specific expert models. On the contrary, expert models excel in their specialized domains but lack zero-shot ability for new tasks. How to obtain both the high performance of expert models and zero-shot ability is an important research direction. In this paper, we attempt to demonstrate that by constructing a model hub and aligning models with their functionalities using model labels, new tasks can be solved in a zero-shot manner by effectively selecting and reusing models in the hub. We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities through a Semantic Directed Acyclic Graph (SDAG) and leverages an algorithm, Classification Head Combination Optimization (CHCO), to select capable models for new tasks. Compared with the foundation model paradigm, it is less costly and more scalable, i.e., the zero-shot ability grows with the sizes of the model hub. Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL, demonstrating that expert models can be effectively reused for zero-shot tasks. Our code will be released publicly.
Problem

Research questions and friction points this paper is trying to address.

Adaptability
Expert Models
Task Versatility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Tagging
Toolbox Construction
Efficient Problem Solving
🔎 Similar Papers
No similar papers found.
J
Jia Zhang
National Key Laboratory for Novel Software Technology, Nanjing University, School of Artifical Intelligence, Nanjing University
Z
Zhi Zhou
National Key Laboratory for Novel Software Technology, Nanjing University, School of Artifical Intelligence, Nanjing University
Lan-Zhe Guo
Lan-Zhe Guo
LAMDA Group, Nanjing University
Machine Learning
Yu-Feng Li
Yu-Feng Li
Professor, Nanjing University
Machine Learning