Aligning Large Language Model Behavior with Human Citation Preferences

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses significant discrepancies between large language models and human preferences in citation behavior, particularly regarding when and what types of content to cite. The authors present the first systematic, fine-grained dataset encompassing eight distinct citation motivations. Leveraging web-text classification, pairwise preference evaluation, and Direct Preference Optimization (DPO), they quantitatively assess and calibrate model-human alignment across diverse text genres. Findings reveal that models over-cite by 27% on Wikipedia-annotated citable spans, yet under-cite sentences containing numerical values or personal names by 22.6% and 20.1%, respectively; citation behavior in medical texts shows greater consistency. DPO effectively enhances alignment between model citation practices and human preferences.

Technology Category

Application Category

📝 Abstract

Most services built on powerful large-scale language models (LLMs) add citations to their output to enhance credibility. Recent research has paid increasing attention to the question of what reference documents to link to outputs. However, how LLMs recognize cite-worthiness and how this process should be controlled remains underexplored. In this study, we focus on what kinds of content LLMs currently tend to cite and how well that behavior aligns with human preferences. We construct a dataset to characterize the relationship between human citation preferences and LLM behavior. Web-derived texts are categorized into eight citation-motivation types, and pairwise citation preferences are exhaustively evaluated across all type combinations to capture fine-grained contrasts. Our results show that humans most frequently seek citations for medical text, and stronger models display a similar tendency. We also find that current models are as much as $27\%$ more likely than humans to add citations to text that is explicitly marked as needing citations on sources such as Wikipedia, and this overemphasis reduces alignment accuracy. Conversely, models systematically underselect numeric sentences (by $-22.6\%$ relative to humans) and sentences containing personal names (by $-20.1\%$), categories for which humans typically demand citations. Furthermore, experiments with Direct Preference Optimization demonstrate that model behavior can be calibrated to better match human citation preferences. We expect this study to provide a foundation for more fine-grained investigations into LLM citation preferences.

Problem

Research questions and friction points this paper is trying to address.

citation alignment

large language models

human preferences

cite-worthiness

reference behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

citation alignment

human preference modeling

large language models

Direct Preference Optimization

cite-worthiness

🔎 Similar Papers

No similar papers found.

Authors to Follow