๐ค AI Summary
This work addresses the pervasive issue of citation errors in scientific literature, which existing methods struggle to verify at scale due to reliance on abstracts or limited datasets. We propose BibAgent, an end-to-end agent framework that integrates large language models, cross-source document retrieval, and an adaptive evidence aggregation mechanism, employing tailored strategies for open-access and paywalled publications. A key innovation is the Evidence Committee mechanism, which infers the validity of citations to restricted-content papers through consensus among downstream citing works. To support systematic evaluation, we introduce MisciteBenchโa large-scale, cross-disciplinary benchmark comprising 6,350 samples spanning five categories of miscitation. Experiments demonstrate that BibAgent significantly outperforms existing LLM-based baselines in both accuracy and interpretability, enabling efficient, traceable, and scalable detection of citation errors.
๐ Abstract
Citations are the bedrock of scientific authority, yet their integrity is compromised by widespread miscitations: ranging from nuanced distortions to fabricated references. Systematic citation verification is currently unfeasible; manual review cannot scale to modern publishing volumes, while existing automated tools are restricted by abstract-only analysis or small-scale, domain-specific datasets in part due to the"paywall barrier"of full-text access. We introduce BibAgent, a scalable, end-to-end agentic framework for automated citation verification. BibAgent integrates retrieval, reasoning, and adaptive evidence aggregation, applying distinct strategies for accessible and paywalled sources. For paywalled references, it leverages a novel Evidence Committee mechanism that infers citation validity via downstream citation consensus. To support systematic evaluation, we contribute a 5-category Miscitation Taxonomy and MisciteBench, a massive cross-disciplinary benchmark comprising 6,350 miscitation samples spanning 254 fields. Our results demonstrate that BibAgent outperforms state-of-the-art Large Language Model (LLM) baselines in citation verification accuracy and interpretability, providing scalable, transparent detection of citation misalignments across the scientific literature.