🤖 AI Summary
Existing in-context learning (ICL) demonstration selection methods rely on heuristic metrics, exhibiting poor robustness and limited cross-model generalization. To address this, we propose D.Va (Demonstration Validation), the first framework to introduce a *demonstration validation* perspective for ICL. D.Va dynamically selects demonstrations that are both effective and generalizable through three synergistic components: multi-dimensional effectiveness evaluation, cross-model consistency verification, and joint optimization with retrieval models. Unlike conventional static scoring approaches, D.Va establishes a transferable, validation-driven demonstration selection paradigm. Extensive experiments demonstrate that D.Va consistently outperforms state-of-the-art methods across diverse natural language understanding (NLU) and natural language generation (NLG) benchmarks. Moreover, it exhibits strong robustness and cross-architecture generalization across multiple large language models (LLMs) and retrieval models.
📝 Abstract
In-context learning (ICL) has demonstrated significant potential in enhancing the capabilities of large language models (LLMs) during inference. It's well-established that ICL heavily relies on selecting effective demonstrations to generate outputs that better align with the expected results. As for demonstration selection, previous approaches have typically relied on intuitive metrics to evaluate the effectiveness of demonstrations, which often results in limited robustness and poor cross-model generalization capabilities. To tackle these challenges, we propose a novel method, extbf{D}emonstration extbf{VA}lidation ( extbf{D.Va}), which integrates a demonstration validation perspective into this field. By introducing the demonstration validation mechanism, our method effectively identifies demonstrations that are both effective and highly generalizable. extbf{D.Va} surpasses all existing demonstration selection techniques across both natural language understanding (NLU) and natural language generation (NLG) tasks. Additionally, we demonstrate the robustness and generalizability of our approach across various language models with different retrieval models.