LLM-Confidence Reranker: A Training-Free Approach for Enhancing Retrieval-Augmented Generation Systems

📅 2026-02-01

🏛️ Expert systems with applications

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the susceptibility of retrieval-augmented generation (RAG) systems to hallucination in knowledge-intensive tasks, a limitation often rooted in suboptimal retrieval and reranking quality. The authors propose a training-free, plug-and-play reranking method that leverages intrinsic confidence signals from black-box large language models (LLMs)—specifically, the Maximum Semantic Cluster Proportion (MSCP)—to refine document rankings. By integrating polynomial sampling, clustering, and query-document confidence threshold binning, the approach enables multi-level parallel reranking compatible with diverse retrievers. Evaluated on BEIR and TREC benchmarks using only 7–9B parameter pretrained LLMs, the method achieves up to a 20.6% improvement in NDCG@5 without any performance degradation, significantly mitigating hallucination risks in high-stakes applications such as medical diagnosis.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have revolutionized natural language processing, yet hallucinations in knowledge-intensive tasks remain a critical challenge. Retrieval-augmented generation (RAG) addresses this by integrating external knowledge, but its efficacy depends on accurate document retrieval and ranking. Although existing rerankers demonstrate effectiveness, they frequently necessitate specialized training, impose substantial computational expenses, and fail to fully exploit the semantic capabilities of LLMs, particularly their inherent confidence signals. We propose the LLM-Confidence Reranker (LCR), a training-free, plug-and-play algorithm that enhances reranking in RAG systems by leveraging black-box LLM confidence derived from Maximum Semantic Cluster Proportion (MSCP). LCR employs a two-stage process: confidence assessment via multinomial sampling and clustering, followed by binning and multi-level sorting based on query and document confidence thresholds. This approach prioritizes relevant documents while preserving original rankings for high-confidence queries, ensuring robustness. Evaluated on BEIR and TREC benchmarks with BM25 and Contriever retrievers, LCR--using only 7--9B-parameter pre-trained LLMs--consistently improves NDCG@5 by up to 20.6% across pre-trained LLM and fine-tuned Transformer rerankers, without degradation. Ablation studies validate the hypothesis that LLM confidence positively correlates with document relevance, elucidating LCR's mechanism. LCR offers computational efficiency, parallelism for scalability, and broad compatibility, mitigating hallucinations in applications like medical diagnosis.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

Hallucination

LLM Confidence

Document Reranking

Training-Free

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM confidence

training-free reranking

retrieval-augmented generation