🤖 AI Summary
This study addresses two critical challenges in clinical AI: excessive documentation burden and low diagnostic prediction accuracy for heart failure (HF). To this end, we propose DistillNote—a novel framework introducing a three-stage progressive clinical note summarization paradigm: from one-step summarization to structured summarization and finally to distillation-based summarization. The framework integrates structured prompt engineering, distillation-driven compression, and LLM-as-judge automated evaluation. We further propose a new metric—compression-performance ratio—to jointly optimize hallucination suppression and clinical utility. Evaluated on the PhysioNet public dataset, DistillNote achieves 79% text compression, an 18.2% improvement in AUPRC, and an average compression-performance ratio of 6.9×. A blinded clinical comparison confirms its statistically significant superiority over baseline methods. The implementation is publicly released as open-source software.
📝 Abstract
Large language models (LLMs) offer unprecedented opportunities to generate concise summaries of patient information and alleviate the burden of clinical documentation that overwhelms healthcare providers. We present Distillnote, a framework for LLM-based clinical note summarization, and generate over 64,000 admission note summaries through three techniques: (1) One-step, direct summarization, and a divide-and-conquer approach involving (2) Structured summarization focused on independent clinical insights, and (3) Distilled summarization that further condenses the Structured summaries. We test how useful are the summaries by using them to predict heart failure compared to a model trained on the original notes. Distilled summaries achieve 79% text compression and up to 18.2% improvement in AUPRC compared to an LLM trained on the full notes. We also evaluate the quality of the generated summaries in an LLM-as-judge evaluation as well as through blinded pairwise comparisons with clinicians. Evaluations indicate that one-step summaries are favoured by clinicians according to relevance and clinical actionability, while distilled summaries offer optimal efficiency (avg. 6.9x compression-to-performance ratio) and significantly reduce hallucinations. We release our summaries on PhysioNet to encourage future research.