🤖 AI Summary
This work investigates whether plan-guided summarization improves faithfulness of small language models (SLMs) on long narrative texts. Addressing the susceptibility of existing fine-grained plans to hallucination, we propose a high-level planning method grounded in narrative structure. Through automated evaluation and rigorous human assessment—specifically targeting faithfulness and hallucination—we find that neither fine-grained nor our novel high-level plan guidance significantly outperforms the plan-free baseline. The root cause is high hallucination rates inherent in the plans themselves, which undermine guidance efficacy and even propagate factual errors. To our knowledge, this is the first systematic study exposing the limitations of plan-guided summarization for complex narratives. Our findings caution against uncritical adoption of planning in long-text and low-resource settings, where plan hallucinations critically compromise reliability. The study provides key empirical evidence for developing trustworthy abstractive summarization systems, highlighting the necessity of hallucination-robust planning mechanisms.
📝 Abstract
Plan-guided summarization attempts to reduce hallucinations in small language models (SLMs) by grounding generated summaries to the source text, typically by targeting fine-grained details such as dates or named entities. In this work, we investigate whether plan-based approaches in SLMs improve summarization in long document, narrative tasks. Narrative texts' length and complexity often mean they are difficult to summarize faithfully. We analyze existing plan-guided solutions targeting fine-grained details, and also propose our own higher-level, narrative-based plan formulation. Our results show that neither approach significantly improves on a baseline without planning in either summary quality or faithfulness. Human evaluation reveals that while plan-guided approaches are often well grounded to their plan, plans are equally likely to contain hallucinations compared to summaries. As a result, the plan-guided summaries are just as unfaithful as those from models without planning. Our work serves as a cautionary tale to plan-guided approaches to summarization, especially for long, complex domains such as narrative texts.