Correcting Hallucinations in News Summaries: Exploration of Self-Correcting LLM Methods with External Knowledge

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address factual hallucinations in news summarization by large language models (LLMs), this paper proposes an external-knowledge-augmented, multi-turn self-correction framework: it first generates critical verification questions via question-answering, then retrieves evidentiary snippets from three complementary search engines, and iteratively refines the summary. This work pioneers the systematic application of the self-correction paradigm to news summarization, revealing that the quality of retrieved search snippets and the design of few-shot prompts are decisive factors in hallucination mitigation. Experiments demonstrate a significant reduction in hallucination rates and marked improvement in factual accuracy. Both G-Eval automated evaluation and human assessment show strong agreement (Spearman’s ρ > 0.92), confirming the method’s effectiveness and practicality in real-world news summarization scenarios.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) have shown remarkable capabilities to generate coherent text, they suffer from the issue of hallucinations -- factually inaccurate statements. Among numerous approaches to tackle hallucinations, especially promising are the self-correcting methods. They leverage the multi-turn nature of LLMs to iteratively generate verification questions inquiring additional evidence, answer them with internal or external knowledge, and use that to refine the original response with the new corrections. These methods have been explored for encyclopedic generation, but less so for domains like news summarization. In this work, we investigate two state-of-the-art self-correcting systems by applying them to correct hallucinated summaries using evidence from three search engines. We analyze the results and provide insights into systems' performance, revealing interesting practical findings on the benefits of search engine snippets and few-shot prompts, as well as high alignment of G-Eval and human evaluation.

Problem

Research questions and friction points this paper is trying to address.

Correcting factual inaccuracies in news summaries

Exploring self-correcting LLM methods with external knowledge

Evaluating performance using search engines and human assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-correcting LLM methods reduce hallucinations

External knowledge from search engines verifies facts

Few-shot prompts improve summary accuracy

🔎 Similar Papers

No similar papers found.