citecheck: An MCP Server for Automated Bibliographic Verification and Repair in Scholarly Manuscripts

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses persistent issues in academic manuscripts—such as erroneous citation identifiers, missing metadata, misattributed authorship, and confusion between preprints and published versions—exacerbated by the propensity of large language models to generate citation hallucinations. To mitigate these challenges, we propose a TypeScript-based Model Context Protocol (MCP) server that integrates automated citation validation into intelligent scholarly editing workflows for the first time. Our system employs a manifestation-aware matching mechanism and policy-gated rewriting strategies, harmonizing data from multiple sources including PubMed, Crossref, arXiv, and Semantic Scholar. It supports structured parsing of diverse file formats and multi-round retrieval to generate precise correction suggestions. The prototype has been rigorously evaluated across 47 test cases covering repair actions, exception handling, and protocol compliance, demonstrating robust defense against both conventional citation errors and LLM-induced hallucinations.

Technology Category

Application Category

📝 Abstract
Reference lists in scholarly manuscripts frequently contain errors, including incorrect identifiers, incomplete metadata, misattributed authors, and mismatches between preprint and published versions. These problems are tedious to repair manually and have become more visible in workflows that rely on large language models, which can fabricate or corrupt citations. We present citecheck, a TypeScript system and MCP server for automated bibliographic verification and repair in paper-like project folders. Given a manuscript file or workspace, citecheck selects the most likely paper artifact, extracts references from .bib, .tex, .md, .txt, or .docx, validates entries against PubMed, Crossref, arXiv, and Semantic Scholar, and returns structured correction proposals together with replacement-safety diagnostics. The current repository provides a working research prototype with multi-pass retrieval, manifestation-aware matching, policy-gated rewrite planning, and 47 passing tests covering repair behavior, malformed payload handling, transport failures, and MCP exposure. We position citecheck as infrastructure for agentic scholarly editing and as a practical guardrail against both traditional reference errors and LLM-induced citation hallucinations.
Problem

Research questions and friction points this paper is trying to address.

bibliographic errors
citation hallucination
reference validation
scholarly manuscripts
LLM-induced errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

automated bibliographic verification
manifestation-aware matching
policy-gated rewrite planning
citation hallucination mitigation
MCP server
🔎 Similar Papers
No similar papers found.