🤖 AI Summary
Existing approaches struggle to effectively characterize the varying levels of information granularity inherent in natural language, often remaining confined to surface-level details or sentence-specific features. This work proposes Granuscore, a reference-free metric for quantifying textual granularity by leveraging the structural properties of a hierarchical embedding space, thereby enabling, for the first time, reliable reconstruction of granularity rankings without reference texts. Experiments on the Granola-EQ dataset and diverse domain corpora demonstrate that Granuscore effectively uncovers systematic patterns of granularity differences among questions, answers, and model outputs in question-answering tasks. Furthermore, it elucidates the nonlinear variations in sentence specificity across multiple QA benchmarks, offering a novel perspective for assessing dataset difficulty.
📝 Abstract
Natural language conveys information at varying levels of granularity, from fine-grained references to broad descriptions. While granularity is fundamental to human communication, existing measures mostly capture surface detail or sentence specificity. We introduce Granuscore, a reference-free measure of granularity that leverages structural properties of a hierarchical embedding space. Granuscore reliably recovers hierarchical orderings on the Granola-EQ dataset and captures expected differences in granularity across discourse contexts. Across domains, we further show that Granuscore explains non-linear variation in sentence specificity beyond sentence length. Finally, we apply Granuscore to four question-answering benchmarks and analyze how granularity differs for questions, gold answers, and model outputs across response outcomes. The analysis reveals consistent differences in model behavior and provides a principled lens for characterizing the difficulty of QA datasets. Together, the results position Granuscore as a scalable, broadly applicable tool for analyzing granularity in text.