Topic Coverage-based Demonstration Retrieval for In-Context Learning

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

To address the lack of knowledge coverage in in-context learning (ICL) example selection, this paper proposes TopicK: a novel framework that first identifies topic-level knowledge required by test inputs via topic modeling; then evaluates the large language model’s (LLM) proficiency across topics; and finally selects examples iteratively using a maximum-coverage strategy—prioritizing those that bridge both uncovered and model-weak topics. TopicK is the first approach to incorporate explicit topic coverage into example retrieval, overcoming limitations of conventional methods relying solely on semantic similarity or generation probability. Extensive experiments across multiple benchmarks and both open- and closed-source LLMs demonstrate that TopicK significantly improves ICL performance and stability, achieves higher knowledge utilization efficiency, and exhibits stronger generalization capability.

Technology Category

Application Category

📝 Abstract

The effectiveness of in-context learning relies heavily on selecting demonstrations that provide all the necessary information for a given test input. To achieve this, it is crucial to identify and cover fine-grained knowledge requirements. However, prior methods often retrieve demonstrations based solely on embedding similarity or generation probability, resulting in irrelevant or redundant examples. In this paper, we propose TopicK, a topic coverage-based retrieval framework that selects demonstrations to comprehensively cover topic-level knowledge relevant to both the test input and the model. Specifically, TopicK estimates the topics required by the input and assesses the model's knowledge on those topics. TopicK then iteratively selects demonstrations that introduce previously uncovered required topics, in which the model exhibits low topical knowledge. We validate the effectiveness of TopicK through extensive experiments across various datasets and both open- and closed-source LLMs. Our source code is available at https://github.com/WonbinKweon/TopicK_EMNLP2025.

Problem

Research questions and friction points this paper is trying to address.

Selecting relevant demonstrations for in-context learning

Covering fine-grained topic knowledge requirements

Avoiding irrelevant or redundant demonstration examples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Topic coverage-based demonstration retrieval

Iteratively selects uncovered required topics

Assesses model's topical knowledge for selection

🔎 Similar Papers

No similar papers found.