🤖 AI Summary
This study identifies, for the first time, “sneaked references”—anomalous citations registered in Crossref metadata but absent from both the main text and reference lists of academic publications, and unidirectionally citing the same journal.
Method: We propose a cross-source comparative detection paradigm integrating Crossref API metadata, PDF full-text parsing (with automated reference extraction), and three rule-driven algorithms: exact matching, fuzzy matching, and citation graph consistency checking.
Contribution/Results: Applied to the IJISRT journal, our approach detects 80,205 sneaked references, achieving an F1-score of 0.92 with the optimal configuration. Scalability and robustness are validated across >2 million articles. This work extends scholarly integrity auditing from content-level analysis to metadata-level scrutiny, establishing a novel paradigm for citation ecosystem governance and enabling systematic detection of metadata-mediated citation manipulation.
📝 Abstract
We report evidence of a new set of sneaked references discovered in the scientific literature. Sneaked references are references registered in the metadata of publications without being listed in reference section or in the full text of the actual publications where they ought to be found. We document here 80,205 references sneaked in metadata of the International Journal of Innovative Science and Research Technology (IJISRT). These sneaked references are registered with Crossref and all cite -- thus benefit -- this same journal. Using this dataset, we evaluate three different methods to automatically identify sneaked references. These methods compare reference lists registered with Crossref against the full text or the reference lists extracted from PDF files. In addition, we report attempts to scale the search for sneaked references to the scholarly literature.