Retrieving Versus Understanding Extractive Evidence in Few-Shot Learning

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work investigates the interplay between retrieval and comprehension of in-document extractive evidence by large language models (LLMs) in few-shot learning, specifically examining whether prediction errors stem from retrieval failures and their underlying causes. We conduct error attribution analysis and two-stage ablation studies across five datasets using two representative closed-source LLMs, with human-annotated gold-standard evidence as ground truth. Our key finding—novel and empirically validated—is that prediction errors are strongly coupled with retrieval errors; however, retrieval failure is primarily attributable not to model comprehension deficits, but rather to suboptimal evidence quality (e.g., low clarity or incompleteness). Crucially, improving retrieval accuracy yields significant gains in final prediction performance. These results provide foundational theoretical support and actionable optimization directions for evidence-retrieval–based downstream tasks.

Technology Category

Application Category

📝 Abstract

A key aspect of alignment is the proper use of within-document evidence to construct document-level decisions. We analyze the relationship between the retrieval and interpretation of within-document evidence for large language model in a few-shot setting. Specifically, we measure the extent to which model prediction errors are associated with evidence retrieval errors with respect to gold-standard human-annotated extractive evidence for five datasets, using two popular closed proprietary models. We perform two ablation studies to investigate when both label prediction and evidence retrieval errors can be attributed to qualities of the relevant evidence. We find that there is a strong empirical relationship between model prediction and evidence retrieval error, but that evidence retrieval error is mostly not associated with evidence interpretation error--a hopeful sign for downstream applications built on this mechanism.

Problem

Research questions and friction points this paper is trying to address.

Analyzes evidence retrieval in few-shot learning

Measures model prediction errors associated with retrieval

Investigates evidence quality impact on errors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes evidence retrieval in few-shot learning

Measures model prediction errors empirically

Investigates evidence retrieval versus interpretation

🔎 Similar Papers

No similar papers found.