🤖 AI Summary
This work addresses the challenge of factual inconsistency and hallucination in clinical text summarization, which commonly arise due to the length, noise, and heterogeneity of source documents. To tackle this issue, the paper introduces an agent-based architecture—the first of its kind for this task—proposing a zero-shot, inference-time framework that decouples generation from verification through four coordinated stages: context selection, draft generation, attention-guided fact verification, and targeted revision. By leveraging internal attention signals to precisely identify weakly supported segments, the framework effectively guides corrective refinements. Comprehensive evaluations on two public datasets—including automatic metrics, LLM-as-a-judge assessments, and human judgments—demonstrate that the proposed method significantly outperforms the base large language model and strong baselines, substantially improving the factual faithfulness of generated summaries.
📝 Abstract
Large language models (LLMs) offer substantial promise for automating clinical text summarization, yet maintaining factual consistency remains challenging due to the length, noise, and heterogeneity of clinical documentation. We present AgenticSum, an inference-time, agentic framework that separates context selection, generation, verification, and targeted correction to reduce hallucinated content. The framework decomposes summarization into coordinated stages that compress task-relevant context, generate an initial draft, identify weakly supported spans using internal attention grounding signals, and selectively revise flagged content under supervisory control. We evaluate AgenticSum on two public datasets, using reference-based metrics, LLM-as-a-judge assessment, and human evaluation. Across various measures, AgenticSum demonstrates consistent improvements compared to vanilla LLMs and other strong baselines. Our results indicate that structured, agentic design with targeted correction offers an effective inference time solution to improve clinical note summarization using LLMs.