Hybrid Pooling with LLMs via Relevance Context Learning

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the high cost of manual query relevance annotation and the limitations of existing large language model (LLM)-based automatic evaluation methods in accurately modeling topical relevance judgments. To overcome these challenges, the authors propose the Relevance Context Learning (RCL) framework, which reformulates relevance assessment as an explicit, context-aware narrative criterion. Specifically, an Instructor LLM distills relevance rules from a small set of labeled examples and guides an Assessor LLM to make more accurate judgments. By moving beyond conventional in-context learning—which relies solely on example demonstrations—RCL integrates a hybrid prompting strategy with tailored prompt engineering. Experiments across multiple datasets demonstrate that RCL significantly outperforms both zero-shot and standard in-context learning approaches, thereby enhancing the quality and efficiency of automated relevance annotation.

Technology Category

Application Category

📝 Abstract

High-quality relevance judgements over large query sets are essential for evaluating Information Retrieval (IR) systems, yet manual annotation remains costly and time-consuming. Large Language Models (LLMs) have recently shown promise as automatic relevance assessors, but their reliability is still limited. Most existing approaches rely on zero-shot prompting or In-Context Learning (ICL) with a small number of labeled examples. However, standard ICL treats examples as independent instances and fails to explicitly capture the underlying relevance criteria of a topic, restricting its ability to generalize to unseen query-document pairs. To address this limitation, we introduce Relevance Context Learning (RCL), a novel framework that leverages human relevance judgements to explicitly model topic-specific relevance criteria. Rather than directly using labeled examples for in-context prediction, RCL first prompts an LLM (Instructor LLM) to analyze sets of judged query-document pairs and generate explicit narratives that describe what constitutes relevance for a given topic. These relevance narratives are then used as structured prompts to guide a second LLM (Assessor LLM) in producing relevance judgements. To evaluate RCL in a realistic data collection setting, we propose a hybrid pooling strategy in which a shallow depth-\textit{k} pool from participating systems is judged by human assessors, while the remaining documents are labeled by LLMs. Experimental results demonstrate that RCL substantially outperforms zero-shot prompting and consistently improves over standard ICL. Overall, our findings indicate that transforming relevance examples into explicit, context-aware relevance narratives is a more effective way of exploiting human judgements for LLM-based IR dataset construction.

Problem

Research questions and friction points this paper is trying to address.

relevance judgement

information retrieval

large language models

in-context learning

dataset construction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Relevance Context Learning

Large Language Models

In-Context Learning