🤖 AI Summary
Automated label generation for scientific literature clustering faces a trade-off: traditional methods yield concise yet opaque labels, while large language models (e.g., ChatGPT) produce highly readable descriptive labels lacking theoretical grounding, systematic methodology, and empirical validation. This paper introduces the first dichotomous framework distinguishing *feature-based* from *descriptive* labels. We formalize descriptive labels, define principled evaluation metrics, and propose a structured generation pipeline integrating cluster analysis, text feature extraction, and readability optimization—fully driven by language models. Experiments demonstrate that our descriptive labels match conventional labels in interpretability and quality while bridging the theory–practice gap. The approach establishes a new paradigm for bibliometric analysis: fully automated yet human-centered, combining rigorous methodology with linguistic naturalness.
📝 Abstract
Automated label generation for clusters of scientific documents is a common task in bibliometric workflows. Traditionally, labels were formed by concatenating distinguishing characteristics of a cluster's documents; while straightforward, this approach often produces labels that are terse and difficult to interpret. The advent and widespread accessibility of generative language models, such as ChatGPT, make it possible to automatically generate descriptive and human-readable labels that closely resemble those assigned by human annotators. Language-model label generation has already seen widespread use in bibliographic databases and analytical workflows. However, its rapid adoption has outpaced the theoretical, practical, and empirical foundations. In this study, we address the automated label generation task and make four key contributions: (1) we define two distinct types of labels: characteristic and descriptive, and contrast descriptive labeling with related tasks; (2) we provide a formal descriptive labeling that clarifies important steps and design considerations; (3) we propose a structured workflow for label generation and outline practical considerations for its use in bibliometric workflows; and (4) we develop an evaluative framework to assess descriptive labels generated by language models and demonstrate that they perform at or near characteristic labels, and highlight design considerations for their use. Together, these contributions clarify the descriptive label generation task, establish an empirical basis for the use of language models, and provide a framework to guide future design and evaluation efforts.