AuthorityBench: Benchmarking LLM Authority Perception for Reliable Retrieval-Augmented Generation

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems to low-authority information sources and their limited capacity to perceive source authority. To this end, we formalize the notion of “authority awareness” for the first time and introduce AuthorityBench—the first multidimensional evaluation benchmark tailored for large language models—comprising three datasets: DomainAuth, EntityAuth, and RAGAuth. Authority is quantified using PageRank and entity popularity metrics, and we propose three evaluation protocols: PointJudge, PairJudge, and ListJudge. Experimental results demonstrate that ListJudge combined with PointScore achieves both high efficiency and cost-effectiveness in assessing authority-relevance. Furthermore, authority-guided document filtering significantly enhances the factual accuracy of RAG-generated answers.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) with external knowledge but remains vulnerable to low-authority sources that can propagate misinformation. We investigate whether LLMs can perceive information authority - a capability extending beyond semantic understanding. To address this, we introduce AuthorityBench, a comprehensive benchmark for evaluating LLM authority perception comprising three datasets: DomainAuth (10K web domains with PageRank-based authority), EntityAuth (22K entities with popularity-based authority), and RAGAuth (120 queries with documents of varying authority for downstream evaluation). We evaluate five LLMs using three judging methods (PointJudge, PairJudge, ListJudge) across multiple output formats. Results show that ListJudge and PairJudge with PointScore output achieve the strongest correlation with ground-truth authority, while ListJudge offers optimal cost-effectiveness. Notably, incorporating webpage text consistently degrades judgment performance, suggesting authority is distinct from textual style. Downstream experiments on RAG demonstrate that authority-guided filtering largely improves answer accuracy, validating the practical importance of authority perception for reliable knowledge retrieval. Code and benchmark are available at: https://github.com/Trustworthy-Information-Access/AuthorityBench.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
Authority Perception
Misinformation
Large Language Models
Information Reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Authority Perception
Retrieval-Augmented Generation
Benchmarking
Large Language Models
Information Reliability
🔎 Similar Papers
No similar papers found.
Z
Zhihui Yao
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
H
Hengran Zhang
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Keping Bi
Keping Bi
Institute of Computing Technology, Chinese Academy of Sciences
Information Retrieval