🤖 AI Summary
To address the performance bottlenecks of in-context learning (ICL) under realistic settings—namely, task-mixed prompting and noisy demonstrations—this paper proposes *Indirect ICL*, a novel paradigm that systematically incorporates influence functions (IFs) into demonstration selection to quantify the causal impact of individual examples on target predictions. Building upon this, we design an IF-enhanced hybrid selection mechanism (e.g., IF+BSR), overcoming limitations of conventional surface-similarity-based methods (e.g., cosine similarity, BERTScore-Recall). Extensive evaluation across multi-task benchmarks—including MMLU, BigBench, StrategyQA, CommonsenseQA, and GLUE—demonstrates consistent gains: under 3/5-shot mixed-task settings, average absolute accuracy improves by 0.37%/1.45%; on noisy GLUE tasks, BSR and cosine selectors achieve +2.94% and +2.90% accuracy gains, respectively. Our core contribution is the establishment of a theoretically grounded, causally informed demonstration selection framework, yielding significant and robust performance improvements.
📝 Abstract
This work introduces a novel paradigm for generalized In-Context Learning (ICL), termed Indirect In-Context Learning. In Indirect ICL, we explore demonstration selection strategies tailored for two distinct real-world scenarios: Mixture of Tasks and Noisy Demonstrations. We systematically evaluate the effectiveness of Influence Functions (IFs) as a selection tool for these settings, highlighting the potential for IFs to better capture the informativeness of examples within the demonstration pool. For the Mixture of Tasks setting, demonstrations are drawn from 28 diverse tasks, including MMLU, BigBench, StrategyQA, and CommonsenseQA. We demonstrate that combining BertScore-Recall (BSR) with an IF surrogate model can significantly improve performance, leading to average absolute accuracy gains of 0.37% and 1.45% for 3-shot and 5-shot setups when compared to traditional ICL metrics. In the Noisy Demonstrations setting, we examine scenarios where demonstrations might be mislabeled. Our experiments show that reweighting traditional ICL selectors (BSR and Cosine Similarity) with IF-based selectors boosts accuracy by an average of 2.90% for Cosine Similarity and 2.94% for BSR on noisy GLUE benchmarks. In sum, we propose a robust framework for demonstration selection that generalizes beyond traditional ICL, offering valuable insights into the role of IFs for Indirect ICL.