🤖 AI Summary
This study addresses citation incompleteness and resultant biases in scholarly impact assessment arising from data source heterogeneity in scientometrics. Methodologically, it systematically compares Web of Science (WoS) and Crossref in terms of coverage characteristics, introducing a dual-dimension quality evaluation framework—Reference Coverage Rate (RCR) and Article Scientific Prestige (ASP)—built upon entity matching and alignment, multi-granularity disciplinary clustering, and fusion-effect attribution analysis. The work establishes, for the first time, an integrated assessment system jointly capturing citation *quantity* and influence *quality*. Results reveal pronounced disciplinary asymmetry and quality polarization in data fusion: WoS excels in covering high-impact publications, whereas Crossref enhances breadth; their integration substantially improves citation network completeness in niche disciplines but exacerbates quality bifurcation, with marked variation in disciplinary sensitivity.
📝 Abstract
As research in the Scientometric deepens, the impact of data quality on research outcomes has garnered increasing attention. This study, based on Web of Science (WoS) and Crossref datasets, systematically evaluates the differences between data sources and the effects of data merging through matching, comparison, and integration. Two core metrics were employed: Reference Coverage Rate (RCR) and Article Scientific Prestige (ASP), which respectively measure citation completeness (quantity) and academic influence (quality). The results indicate that the WoS dataset outperforms Crossref in its coverage of high-impact literature and ASP scores, while the Crossref dataset provides complementary value through its broader coverage of literature. Data merging significantly improves the completeness of the citation network, with particularly pronounced benefits in smaller disciplinary clusters such as Education and Arts. However, data merging also introduces some low-quality citations, resulting in a polarization of overall data quality. Moreover, the impact of data merging varies across disciplines; high-impact clusters such as Science, Biology, and Medicine benefit the most, whereas clusters like Social Sciences and Arts are more vulnerable to negative effects. This study highlights the critical role of data sources in Scientometric research and provides a framework for assessing and improving data quality.