CitePrism: Human-in-the-Loop AI for Citation Auditing and Editorial Integrity

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This study addresses the challenge of efficiently and scalably auditing citations in academic manuscripts for relevance, accuracy, timeliness, and ethical compliance. To this end, the authors propose a transparent hybrid decision-support framework that integrates contextual reasoning from large language models, semantic similarity computation, metadata validation, and human review. Notably, the framework incorporates a human-in-the-loop feedback mechanism into the citation auditing pipeline for the first time and introduces a configurable, multi-signal–based three-tier review process with tunable thresholds, balancing conservative screening with editorial controllability. Evaluated on a test set of 104 references, the system achieved a Cohen’s kappa of 0.429 against human annotations for relevance judgment and, at a threshold τ = 17, successfully identified all irrelevant citations, demonstrating its effectiveness and potential as an intelligent tool for citation quality screening.

📝 Abstract

Editors and reviewers are expected to ensure that manuscripts cite relevant, accurate, current, and ethically appropriate literature, yet manuscript-level citation auditing remains largely manual, fragmented, and difficult to scale. Citation context, metadata quality, self-citation patterns, and bibliographic integrity all affect whether a reference appropriately supports a local claim. We present CitePrism, a transparent hybrid decision-support framework for editorial citation auditing that combines LLM-assisted contextual reasoning, embedding-based semantic similarity, metadata verification, integrity-oriented flags, and human-in-the-loop analyst review. CitePrism extracts citation neighborhoods, enriches reference metadata, computes fused relevance scores, surfaces metadata and self-citation review prompts, and supports configurable threshold-based triage. In a preliminary validation on a single case-study manuscript with 104 references from pavement engineering, agreement with human binary relevance labels reached Cohen's kappa = 0.429. At operating threshold tau = 17, CitePrism flagged all human-labeled irrelevant citations, while also producing false positives requiring analyst review. These results suggest that CitePrism may support conservative editorial screening and citation-quality triage, but they do not establish general editorial performance. CitePrism is intended as pilot-stage decision support, not as an autonomous misconduct detector or automated editorial decision system. Broader validation across manuscripts, domains, annotators, baselines, and deployment settings is required before operational use.

Problem

Research questions and friction points this paper is trying to address.

citation auditing

editorial integrity

reference relevance

bibliographic integrity

human-in-the-loop

Innovation

Methods, ideas, or system contributions that make the work stand out.

human-in-the-loop

citation auditing

LLM-assisted reasoning