A comparison of citation-based clustering and topic modeling for science mapping

📅 2023-09-12
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

229K/year
🤖 AI Summary
Traditional scientometric approaches—topic modeling (TM, e.g., LDA) and citation clustering (CC, e.g., VOS clustering)—exhibit divergent assumptions about scientific structure, yet their comparative representational capacities in mapping cardiovascular science remain underexplored. Method: This study systematically compares TM and CC across three dimensions—disciplinary architecture, responsiveness to societal needs, and delineation of academic micro-communities—and introduces a cross-method mapping framework to align topics and clusters. Contribution/Results: Empirical analysis reveals only weak correspondence (<33% document overlap) between TM-derived topics and CC-derived clusters. TM proves more sensitive to policy-relevant societal themes (e.g., public health interventions), whereas CC more accurately reconstructs knowledge evolution trajectories and disease-subtype-driven scholarly communities. The findings demonstrate functional complementarity: neither method alone suffices to capture the multidimensional nature of scientific fields. This work provides a methodological foundation for informed tool selection in scientometrics and science policy research.
📝 Abstract
Understanding the different ways in which different science mapping approaches capture the structure of scientific fields is critical. This paper presents a comparative analysis of two commonly used approaches, topic modeling (TM) and citation-based clustering (CC), to assess their respective strengths, weaknesses, and the characteristics of their results. We compare the two approaches using cluster-to-topic and topic-to-cluster mappings based on science maps of cardiovascular research generated by TM and CC. Our findings reveal that relations between topics and clusters are generally weak, with limited overlap between topics and clusters. Only in a few exceptional cases do more than one-third of the documents in a topic belong to the same cluster, or vice versa. For TM the presence of highly similar topics is a considerable challenge. A strength of TM is its ability to represent societal needs related to cardiovascular disease, potentially offering valuable insights for policymakers. In contrast, CC excels in depicting the intellectual structure of cardiovascular diseases, with a strong capability to reflect scientific micro-communities. This study deepens the understanding of the use of TM and CC for science mapping, providing insights for users on how to apply these approaches based on their needs.
Problem

Research questions and friction points this paper is trying to address.

Compare topic modeling and citation-based clustering
Assess strengths and weaknesses of science mapping
Understand structure of scientific fields
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparative analysis of TM and CC
Cluster-to-topic mapping technique
Insights for science mapping application
🔎 Similar Papers
No similar papers found.