🤖 AI Summary
This study investigates the effectiveness and limitations of reasoning capabilities in large language models (LLMs) for abstractive summarization. Addressing the underexamined hypothesis that “reasoning universally improves summary quality,” we systematically evaluate eight explicit and implicit reasoning strategies—including chain-of-thought, self-consistency, and stepwise reasoning—across three prominent reasoning-oriented LLMs and eight diverse benchmark datasets. Results reveal a critical trade-off: explicit reasoning enhances linguistic fluency but degrades factual consistency, whereas implicit reasoning exhibits the opposite pattern; moreover, increasing internal reasoning steps does not consistently improve performance. Our core contribution is the empirical identification of an inherent tension between summary quality and factual fidelity in abstractive summarization, leading to the principle that “faithful compression outweighs excessive reasoning.” This work provides both empirical evidence and methodological guidance for the principled design and deployment of reasoning mechanisms in LLM-based summarization systems.
📝 Abstract
While the reasoning capabilities of Large Language Models (LLMs) excel in analytical tasks such as mathematics and code generation, their utility for abstractive summarization remains widely assumed but largely unverified. To bridge this gap, we first tailor general reasoning strategies to the summarization domain. We then conduct a systematic, large scale comparative study of 8 reasoning strategies and 3 Large Reasoning Models (LRMs) across 8 diverse datasets, assessing both summary quality and faithfulness. Our findings show that reasoning is not a universal solution and its effectiveness is highly dependent on the specific strategy and context. Specifically, we observe a trade-off between summary quality and factual faithfulness: explicit reasoning strategies tend to improve fluency at the expense of factual grounding, while implicit reasoning in LRMs exhibits the inverse pattern. Furthermore, increasing an LRM's internal reasoning budget does not improve, and can even hurt, factual consistency, suggesting that effective summarization demands faithful compression rather than creative over-thinking.