🤖 AI Summary
This paper addresses the problem of sentence-level citation deficiencies—such as missing, mismatched, or unverifiable citations—in large language model (LLM) responses, which currently rely heavily on costly manual annotation. To this end, we propose a self-supervised sentence-level citation alignment framework. Methodologically, we introduce a novel model self-feedback–based context ablation reward mechanism: by automatically ablating supporting sentences, we generate fine-grained, annotation-free reward signals that jointly guide best-of-N sampling during inference and direct preference optimization (DPO) fine-tuning during training. The framework eliminates the need for human-annotated citations while simultaneously enhancing inference-time citation fidelity and improving model parameters. Evaluated on five long-context question-answering tasks in LongBench-Cite, our approach achieves up to a 5.3-point improvement in citation F1, significantly boosting citation accuracy, traceability, and reliability.
📝 Abstract
We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. Instead of only relying on costly and labor-intensive annotations, SelfCite leverages a reward signal provided by the LLM itself through context ablation: If a citation is necessary, removing the cited text from the context should prevent the same response; if sufficient, retaining the cited text alone should preserve the same response. This reward can guide the inference-time best-of-N sampling strategy to improve citation quality significantly, as well as be used in preference optimization to directly fine-tune the models for generating better citations. The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark across five long-form question answering tasks.