Source Coverage and Citation Bias in LLM-based vs. Traditional Search Engines

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the low citation transparency and unclear source bias of Large Language Model–based Search Engines (LLM-SEs), systematically benchmarking them against Traditional Search Engines (TSEs) across source coverage breadth, credibility, political neutrality, and security. We conduct a large-scale empirical study on 55,936 real-world queries across six LLM-SEs and two TSEs, proposing the first attribution analysis framework for source selection—integrating log mining, cross-engine comparison, and multidimensional quality metrics. Results show that while LLM-SEs significantly enhance source diversity (37% of retrieved sources are unique to LLM-SEs), they do not improve credibility, neutrality, or security over TSEs. Key technical factors—including retrieval-augmented generation design—and content attributes—such as page authority and ideological leaning—are identified as primary determinants of source selection. These findings provide an empirical foundation for designing trustworthy, transparent, and bias-aware LLM-SEs.

Technology Category

Application Category

📝 Abstract

LLM-based Search Engines (LLM-SEs) introduces a new paradigm for information seeking. Unlike Traditional Search Engines (TSEs) (e.g., Google), these systems summarize results, often providing limited citation transparency. The implications of this shift remain largely unexplored, yet raises key questions regarding trust and transparency. In this paper, we present a large-scale empirical study of LLM-SEs, analyzing 55,936 queries and the corresponding search results across six LLM-SEs and two TSEs. We confirm that LLM-SEs cites domain resources with greater diversity than TSEs. Indeed, 37% of domains are unique to LLM-SEs. However, certain risks still persist: LLM-SEs do not outperform TSEs in credibility, political neutrality and safety metrics. Finally, to understand the selection criteria of LLM-SEs, we perform a feature-based analysis to identify key factors influencing source choice. Our findings provide actionable insights for end users, website owners, and developers.

Problem

Research questions and friction points this paper is trying to address.

Compares source diversity and citation transparency in LLM-based vs. traditional search engines.

Assesses credibility, neutrality, and safety risks in LLM-based search engine outputs.

Identifies factors influencing source selection in LLM-based search engines.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale empirical study of LLM-based search engines

Feature-based analysis to identify source selection criteria

Comparison of source diversity and credibility with traditional engines

🔎 Similar Papers

No similar papers found.