🤖 AI Summary
Citation networks are often highly fragmented due to missing semantic links, which hinders effective modeling of scientific structure. This work proposes a hybrid framework that integrates citation topology with large language model (LLM)-driven textual similarity to enhance network connectivity by introducing semantic edges and reweighting original citations. Combining LLM-based semantic computation, Leiden community detection, and graph structure augmentation, the approach is validated on a corpus of 660,000 scholarly documents. Results demonstrate that the method substantially reduces fragmentation while preserving disciplinary homogeneity and structural interpretability, thereby enabling efficient, multi-scale clustering analysis. The framework exhibits strong scalability and practical utility for large-scale scientometric studies.
📝 Abstract
Citation graphs are fundamental tools for modeling scientific structure, but are often fragmented due to missing citations of scientifically connected articles. To address this issue, we propose a computationally efficient hybrid framework integrating citation topology with large language model (LLM)-based text similarity. Using 662,369 Web of Science publications in Mathematics and Operations Research & Management Science, we augment the original graph by adding semantic edges from small, disconnected components and weighting existing citations according to textual similarity. Semantic augmentation substantially reduces fragmentation while preserving disciplinary homogeneity. Compared to embedding-only clustering, cluster detection on augmented graphs using the Leiden algorithm retains structural interpretability while offering multi-scale organization. The method scales efficiently to large datasets and offers a practical strategy for strengthening citation-based indicators without collapsing disciplinary boundaries.