Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

127K/year

🤖 AI Summary

This work addresses the lack of effective utility evaluation for document grounding in existing retrieval-augmented generation (RAG) systems, where conventional approaches either rely on costly human annotations or fail to account for the intrinsic characteristics of large language models (LLMs). We propose GroGU, the first annotation-free, model-specific utility metric that quantifies the actual contribution of retrieved documents by analyzing output entropy during LLM generation, without requiring reference texts. Leveraging this metric, we further guide the direct preference optimization (DPO) training of a query rewriter. Experimental results demonstrate that our approach achieves up to an 18.2-point improvement in Mean Reciprocal Rank and a 9.4-point gain in answer accuracy, significantly outperforming existing model-agnostic evaluation methods.

Technology Category

Application Category

📝 Abstract

Retrieval Augmented Generation (RAG)'s success depends on the utility the LLM derives from the content used for grounding. Quantifying content utility does not have a definitive specification and existing metrics ignore model-specific capabilities and/or rely on costly annotations. In this paper, we propose Grounding Generation Utility (GroGU), a model-specific and reference-free metric that defines utility as a function of the downstream LLM's generation confidence based on entropy. Despite having no annotation requirements, GroGU is largely faithful in distinguishing ground-truth documents while capturing nuances ignored by LLM-agnostic metrics. We apply GroGU to train a query-rewriter for RAG by identifying high-utility preference data for Direct Preference Optimization. Experiments show improvements by up to 18.2 points in Mean Reciprocal Rank and up to 9.4 points in answer accuracy.

Problem

Research questions and friction points this paper is trying to address.

Grounding Utility

Retrieval Augmented Generation

Reference-Free Evaluation

LLM-based Metrics

Content Utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Grounding Generation Utility

Retrieval Augmented Generation

reference-free metric