Reconnecting Fragmented Citation Networks with Semantic Augmentation

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

207K/year
🤖 AI Summary
Citation networks are often highly fragmented due to missing semantic links, which hinders effective modeling of scientific structure. This work proposes a hybrid framework that integrates citation topology with large language model (LLM)-driven textual similarity to enhance network connectivity by introducing semantic edges and reweighting original citations. Combining LLM-based semantic computation, Leiden community detection, and graph structure augmentation, the approach is validated on a corpus of 660,000 scholarly documents. Results demonstrate that the method substantially reduces fragmentation while preserving disciplinary homogeneity and structural interpretability, thereby enabling efficient, multi-scale clustering analysis. The framework exhibits strong scalability and practical utility for large-scale scientometric studies.
📝 Abstract
Citation graphs are fundamental tools for modeling scientific structure, but are often fragmented due to missing citations of scientifically connected articles. To address this issue, we propose a computationally efficient hybrid framework integrating citation topology with large language model (LLM)-based text similarity. Using 662,369 Web of Science publications in Mathematics and Operations Research & Management Science, we augment the original graph by adding semantic edges from small, disconnected components and weighting existing citations according to textual similarity. Semantic augmentation substantially reduces fragmentation while preserving disciplinary homogeneity. Compared to embedding-only clustering, cluster detection on augmented graphs using the Leiden algorithm retains structural interpretability while offering multi-scale organization. The method scales efficiently to large datasets and offers a practical strategy for strengthening citation-based indicators without collapsing disciplinary boundaries.
Problem

Research questions and friction points this paper is trying to address.

citation networks
fragmentation
missing citations
scientific structure
semantic augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic augmentation
citation network
large language model
graph fragmentation
Leiden algorithm
🔎 Similar Papers
2024-09-05Knowledge Discovery and Data MiningCitations: 2
V
Vu Thi Huong
Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany; Institute of Mathematics, Vietnam Academy of Science and Technology, 10072 Hanoi, Vietnam
A
Annika Buchholz
Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
I
Imene Khebouri
Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
Thorsten Koch
Thorsten Koch
TU Berlin / Zuse Institute Berlin
MathematicsLinear ProgrammingInteger Programming
T
Tim Kunt
Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
W
Wolfgang Peters-Kottig
Kooperativer Bibliotheksverbund Berlin-Brandenburg (KOBV), Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
T
Tomasz Stompor
Kooperativer Bibliotheksverbund Berlin-Brandenburg (KOBV), Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
J
Janina Zittel
Applied Optimization, Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany