From Coverage to Prestige: A Comprehensive Assessment of Large-Scale Scientometric Data

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses citation incompleteness and resultant biases in scholarly impact assessment arising from data source heterogeneity in scientometrics. Methodologically, it systematically compares Web of Science (WoS) and Crossref in terms of coverage characteristics, introducing a dual-dimension quality evaluation framework—Reference Coverage Rate (RCR) and Article Scientific Prestige (ASP)—built upon entity matching and alignment, multi-granularity disciplinary clustering, and fusion-effect attribution analysis. The work establishes, for the first time, an integrated assessment system jointly capturing citation *quantity* and influence *quality*. Results reveal pronounced disciplinary asymmetry and quality polarization in data fusion: WoS excels in covering high-impact publications, whereas Crossref enhances breadth; their integration substantially improves citation network completeness in niche disciplines but exacerbates quality bifurcation, with marked variation in disciplinary sensitivity.

Technology Category

Application Category

📝 Abstract
As research in the Scientometric deepens, the impact of data quality on research outcomes has garnered increasing attention. This study, based on Web of Science (WoS) and Crossref datasets, systematically evaluates the differences between data sources and the effects of data merging through matching, comparison, and integration. Two core metrics were employed: Reference Coverage Rate (RCR) and Article Scientific Prestige (ASP), which respectively measure citation completeness (quantity) and academic influence (quality). The results indicate that the WoS dataset outperforms Crossref in its coverage of high-impact literature and ASP scores, while the Crossref dataset provides complementary value through its broader coverage of literature. Data merging significantly improves the completeness of the citation network, with particularly pronounced benefits in smaller disciplinary clusters such as Education and Arts. However, data merging also introduces some low-quality citations, resulting in a polarization of overall data quality. Moreover, the impact of data merging varies across disciplines; high-impact clusters such as Science, Biology, and Medicine benefit the most, whereas clusters like Social Sciences and Arts are more vulnerable to negative effects. This study highlights the critical role of data sources in Scientometric research and provides a framework for assessing and improving data quality.
Problem

Research questions and friction points this paper is trying to address.

Evaluates differences between Web of Science and Crossref datasets.
Assesses impact of data merging on citation completeness and academic influence.
Explores disciplinary variations in data quality and merging effects.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Web of Science and Crossref datasets
Employs Reference Coverage Rate and Article Scientific Prestige
Data merging enhances citation network completeness
🔎 Similar Papers
No similar papers found.
Guoyang Rong
Guoyang Rong
Wuhan University, PhD Candidate
Y
Ying Chen
Center for Quantitative Finance, Department of Mathematics, National University of Singapore, 119076, Singapore; Risk Management Institute, National University of Singapore, 119076, Singapore
Thorsten Koch
Thorsten Koch
TU Berlin / Zuse Institute Berlin
MathematicsLinear ProgrammingInteger Programming
K
Keisuke Honda
The Institute of Statistical Mathematics, 190-8562, Japan