🤖 AI Summary
To address factual inconsistency, information loss, and poor traceability in long-document summarization, this paper proposes a sentence-level highlighting-guided self-planning generation framework. First, it identifies salient sentences via importance modeling and generates a traceable content plan; subsequently, summary generation is conditioned on this plan, effectively decoupling content selection from surface realization. This novel paradigm significantly enhances summary faithfulness and fine-grained detail retention. On the GovReport benchmark, our approach achieves a +4.1-point improvement in ROUGE-L and a 35% gain in SummaC score. Qualitative analysis confirms more complete preservation of critical details, as well as improved cross-domain accuracy and analytical depth in generated summaries.
📝 Abstract
We introduce a novel approach for long context summarisation, highlight-guided generation, that leverages sentence-level information as a content plan to improve the traceability and faithfulness of generated summaries. Our framework applies self-planning methods to identify important content and then generates a summary conditioned on the plan. We explore both an end-to-end and two-stage variants of the approach, finding that the two-stage pipeline performs better on long and information-dense documents. Experiments on long-form summarisation datasets demonstrate that our method consistently improves factual consistency while preserving relevance and overall quality. On GovReport, our best approach has improved ROUGE-L by 4.1 points and achieves about 35% gains in SummaC scores. Qualitative analysis shows that highlight-guided summarisation helps preserve important details, leading to more accurate and insightful summaries across domains.