Efficient and Effective In-context Demonstration Selection with Coreset

πŸ“… 2025-11-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In-context learning (ICL) for vision-language models (VLMs) critically depends on example selectionβ€”a provably NP-hard problem. Existing strategies, including random sampling, similarity-based selection, or information-theoretic scoring, struggle to balance computational efficiency and effectiveness. To address this, we propose CoDR, the first framework to adapt the coreset paradigm to ICL example selection. CoDR constructs a diverse core subset via clustering-based pruning and introduces a two-stage retrieval mechanism that jointly optimizes example similarity to the query and mutual information among selected examples, under query-alignment constraints. The method comprises three key components: cluster pruning, diversity-aware subset construction, and collaborative dual-stage retrieval. Extensive experiments across multiple vision-language benchmarks demonstrate that CoDR consistently outperforms state-of-the-art baselines, achieving significant gains in both ICL accuracy and computational efficiency.

Technology Category

Application Category

πŸ“ Abstract
In-context learning (ICL) has emerged as a powerful paradigm for Large Visual Language Models (LVLMs), enabling them to leverage a few examples directly from input contexts. However, the effectiveness of this approach is heavily reliant on the selection of demonstrations, a process that is NP-hard. Traditional strategies, including random, similarity-based sampling and infoscore-based sampling, often lead to inefficiencies or suboptimal performance, struggling to balance both efficiency and effectiveness in demonstration selection. In this paper, we propose a novel demonstration selection framework named Coreset-based Dual Retrieval (CoDR). We show that samples within a diverse subset achieve a higher expected mutual information. To implement this, we introduce a cluster-pruning method to construct a diverse coreset that aligns more effectively with the query while maintaining diversity. Additionally, we develop a dual retrieval mechanism that enhances the selection process by achieving global demonstration selection while preserving efficiency. Experimental results demonstrate that our method significantly improves the ICL performance compared to the existing strategies, providing a robust solution for effective and efficient demonstration selection.
Problem

Research questions and friction points this paper is trying to address.

Selecting optimal demonstrations for in-context learning is NP-hard
Existing methods struggle to balance efficiency and effectiveness
Current approaches often yield suboptimal performance in demonstration selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coreset-based dual retrieval for demonstration selection
Cluster-pruning method constructs diverse coreset efficiently
Dual retrieval mechanism ensures global selection with efficiency
πŸ”Ž Similar Papers
No similar papers found.
Zihua Wang
Zihua Wang
PhD student of Southeast University
computer science
J
Jiarui Wang
School of Computer Science and Engineering and the Key Laboratory of New Generation Artificial Intelligence Technology and its Interdisciplinary Applications, Southeast University, Nanjing 210096, China.
H
Haiyang Xu
Tongyi Lab, Alibaba Group.
M
Ming Yan
Tongyi Lab, Alibaba Group.
F
Fei Huang
Tongyi Lab, Alibaba Group.
X
Xu Yang
School of Computer Science and Engineering and the Key Laboratory of New Generation Artificial Intelligence Technology and its Interdisciplinary Applications, Southeast University, Nanjing 210096, China.
Xiu-Shen Wei
Xiu-Shen Wei
Professor, Southeast University
Computer VisionMachine LearningArtificial Intelligence
S
Siya Mi
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China. Purple Mountain Laboratories, Nanjing 210000, China.
Y
Yu Zhang
School of Computer Science and Engineering and the Key Laboratory of New Generation Artificial Intelligence Technology and its Interdisciplinary Applications, Southeast University, Nanjing 210096, China.