Improving Factual Consistency of News Summarization by Contrastive Preference Optimization

📅 2023-10-30
🏛️ Conference on Empirical Methods in Natural Language Processing
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address factual hallucinations—such as spurious causal claims or fabricated details—in news summarization by large language models (LLMs), this paper proposes a Contrastive Preference Optimization (CPO) framework, the first to apply contrastive preference learning for improving factual consistency in summarization. Methodologically, CPO decouples the model’s generation tendencies toward factual versus hallucinated content and integrates probe-based supervised fine-tuning to enhance hallucination awareness and suppression without requiring explicit hallucination labels. Experiments on multiple news summarization benchmarks (e.g., XSum, CNN/DM) demonstrate that CPO significantly reduces hallucination rates and improves factual accuracy by 4.2–7.8 percentage points on average, markedly enhancing summary reliability. Key contributions include: (1) establishing a contrastive preference modeling paradigm explicitly designed for factual consistency; and (2) introducing a plug-and-play probe-based training mechanism enabling efficient, annotation-free intervention.
📝 Abstract
Despite the recent progress in news summarization made by large language models (LLMs), they often generate summaries that are factually inconsistent with original articles, known as"hallucinations"in text generation. Unlike previous small models (e.g., BART, T5), current LLMs make fewer silly mistakes but more sophisticated ones, such as imposing cause and effect, adding false details, overgeneralizing, etc. These hallucinations are challenging to detect through traditional methods, which poses great challenges for improving the factual consistency of text summarization. In this paper, we propose Contrastive Preference Optimization (CPO) to disentangle the LLMs' propensities to generate faithful and fake content. Furthermore, we adopt a probing-based specific training method to improve their capacity of distinguishing two types of propensities. In this way, LLMs can execute the instructions more accurately and have enhanced perception of hallucinations. Experimental results show that CPO significantly improves the reliability of summarization based on LLMs.
Problem

Research questions and friction points this paper is trying to address.

Improving factual consistency in news summarization
Reducing hallucinations in large language models
Enhancing detection of sophisticated factual errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Preference Optimization
probing-based training
enhanced hallucination perception
🔎 Similar Papers
No similar papers found.
Huawen Feng
Huawen Feng
South China University of Technology, Alibaba Tongyi Lab, Microsoft Research Asia, Tencent Hunyuan X
NLPLarge Language ModelsPost TrainingReinforcement LearningPreference Optimization
Y
Yan Fan
Alibaba Group
X
Xiong Liu
Alibaba Group
Ting-En Lin
Ting-En Lin
Alibaba Group, Tongyi
Natural Language ProcessingSpoken Dialogue SystemLarge Language ModelDeep Learning
Z
Zekun Yao
School of Computer Science and Engineering, South China University of Technology, China
Yuchuan Wu
Yuchuan Wu
Alibaba Tongyi Lab(通义实验室)
Conversational AILarge Language ModelsSocial Intelligence
F
Fei Huang
Alibaba Group
Y
Yongbin Li
Alibaba Group
Q
Qianli Ma
School of Computer Science and Engineering, South China University of Technology, China