🤖 AI Summary
This study challenges the prevailing assumption that finer citation granularity invariably yields better attribution in generative models, systematically examining the impact of sentence-, paragraph-, and document-level citations on both attribution quality and answer correctness. Employing a controlled attribution generation framework and fine-grained evaluation metrics across models ranging from 8B to 120B parameters, the work reveals—for the first time—a non-monotonic interaction between citation granularity and model scale: overly fine-grained citations impair large models’ ability to synthesize information across multiple sentences. Experimental results demonstrate that paragraph-level citations improve attribution quality by 16%–276% over sentence-level citations while maintaining or even enhancing answer accuracy, suggesting that an intermediate granularity aligned with the model’s semantic scope optimally balances credibility and reliability.
📝 Abstract
Citation granularity - whether to cite individual sentences, paragraphs, or documents - is a critical design choice in attributed generation. While fine-grained citations are often preferred for precise human verification, their impact on model performance remains under-explored. We analyze four model scales (8B-120B) and demonstrate that enforcing fine-grained citations degrades attribution quality by 16-276% compared to the best-performing granularity. We observe a consistent performance pattern where attribution quality peaks at intermediate granularities (paragraph-level). Our analysis suggests that fine-grained (sentence-level) citations disrupt necessary semantic dependencies for attributing evidence to answer claims, while excessively coarse citations (multi-paragraph) introduce distracting noise. Importantly, the magnitude of this performance gap varies non-monotonically with model scale: fine-grained constraints disproportionately penalize larger models, suggesting that atomic citation units disrupt the multi-sentence information synthesis at which these models excel. Strikingly, citation-optimal granularity leads to substantial gains in attribution quality while preserving or even improving answer correctness. Overall, our findings demonstrate that optimizing solely for human verification via fine-grained citation disregards model constraints, compromising both attribution faithfulness and generation reliability. Instead, effective attribution requires aligning citation granularity with the model's natural semantic scope.