🤖 AI Summary
To address factual hallucinations in news summarization by large language models (LLMs), this paper proposes an external-knowledge-augmented, multi-turn self-correction framework: it first generates critical verification questions via question-answering, then retrieves evidentiary snippets from three complementary search engines, and iteratively refines the summary. This work pioneers the systematic application of the self-correction paradigm to news summarization, revealing that the quality of retrieved search snippets and the design of few-shot prompts are decisive factors in hallucination mitigation. Experiments demonstrate a significant reduction in hallucination rates and marked improvement in factual accuracy. Both G-Eval automated evaluation and human assessment show strong agreement (Spearman’s ρ > 0.92), confirming the method’s effectiveness and practicality in real-world news summarization scenarios.
📝 Abstract
While large language models (LLMs) have shown remarkable capabilities to generate coherent text, they suffer from the issue of hallucinations -- factually inaccurate statements. Among numerous approaches to tackle hallucinations, especially promising are the self-correcting methods. They leverage the multi-turn nature of LLMs to iteratively generate verification questions inquiring additional evidence, answer them with internal or external knowledge, and use that to refine the original response with the new corrections. These methods have been explored for encyclopedic generation, but less so for domains like news summarization. In this work, we investigate two state-of-the-art self-correcting systems by applying them to correct hallucinated summaries using evidence from three search engines. We analyze the results and provide insights into systems' performance, revealing interesting practical findings on the benefits of search engine snippets and few-shot prompts, as well as high alignment of G-Eval and human evaluation.