🤖 AI Summary
To address the imbalance between relevance and diversity in few-shot exemplar selection, this paper proposes RDES: a reinforcement learning–based method that jointly optimizes relevance and diversity via Q-learning integrated with a label-distribution-driven diversity scoring mechanism. RDES unifies few-shot prompting with Chain-of-Thought (CoT) reasoning and is rigorously evaluated across four text classification and reasoning benchmarks and twelve large language models, consistently outperforming ten baselines; CoT integration further boosts accuracy. Key contributions are: (1) a learnable, policy-based exemplar selection strategy that eliminates reliance on handcrafted heuristics; (2) a label-aware diversity metric that explicitly enforces class distribution balance; and (3) an end-to-end, task-adaptive dynamic optimization framework for in-context learning (ICL) exemplar sets.
📝 Abstract
Diversity in demonstration selection is crucial for enhancing model generalization, as it enables a broader coverage of structures and concepts. However, constructing an appropriate set of demonstrations has remained a focal point of research. This paper presents the Relevance-Diversity Enhanced Selection (RDES), an innovative approach that leverages reinforcement learning to optimize the selection of diverse reference demonstrations for text classification tasks using Large Language Models (LLMs), especially in few-shot prompting scenarios. RDES employs a Q-learning framework to dynamically identify demonstrations that maximize both diversity and relevance to the classification objective by calculating a diversity score based on label distribution among selected demonstrations. This method ensures a balanced representation of reference data, leading to improved classification accuracy. Through extensive experiments on four benchmark datasets and involving 12 closed-source and open-source LLMs, we demonstrate that RDES significantly enhances classification accuracy compared to ten established baselines. Furthermore, we investigate the incorporation of Chain-of-Thought (CoT) reasoning in the reasoning process, which further enhances the model's predictive performance. The results underscore the potential of reinforcement learning to facilitate adaptive demonstration selection and deepen the understanding of classification challenges.