CiteCheck: Towards Accurate Citation Faithfulness Detection

📅 2025-02-15

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the challenge of costly human annotation of negative samples for reference faithfulness detection in Chinese Retrieval-Augmented Generation (RAG) systems. To overcome this, we propose a low-cost, two-stage human annotation paradigm and introduce CiteCheck—the first large-scale, high-quality, and class-balanced Chinese benchmark dataset for citation faithfulness evaluation. Our methodology integrates LLM-assisted negative sampling, human-in-the-loop verification, and parameter-efficient fine-tuning (PEFT), substantially reducing annotation effort. Experiments reveal that state-of-the-art (SOTA) large language models still achieve limited accuracy on this challenging benchmark, whereas smaller LLMs—enhanced via LLM-generated training data and fine-tuned with PEFT—attain competitive performance. The CiteCheck dataset is publicly released, establishing a critical infrastructure for advancing trustworthy RAG research in Chinese.

Technology Category

Application Category

📝 Abstract

Citation faithfulness detection is critical for enhancing retrieval-augmented generation (RAG) systems, yet large-scale Chinese datasets for this task are scarce. Existing methods face prohibitive costs due to the need for manually annotated negative samples. To address this, we introduce the first large-scale Chinese dataset CiteCheck for citation faithfulness detection, constructed via a cost-effective approach using two-stage manual annotation. This method balances positive and negative samples while significantly reducing annotation expenses. CiteCheck comprises training and test splits. Experiments demonstrate that: (1) the test samples are highly challenging, with even state-of-the-art LLMs failing to achieve high accuracy; and (2) training data augmented with LLM-generated negative samples enables smaller models to attain strong performance using parameter-efficient fine-tuning. CiteCheck provides a robust foundation for advancing citation faithfulness detection in Chinese RAG systems. The dataset is publicly available to facilitate research.

Problem

Research questions and friction points this paper is trying to address.

Citation faithfulness detection in Chinese

Addressing scarcity of large-scale datasets

Reducing annotation costs effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale Chinese dataset

Cost-effective manual annotation

LLM-generated negative samples

🔎 Similar Papers

No similar papers found.