🤖 AI Summary
Clinical documentation consumes substantial time, severely impeding physician–patient interaction. To address this critical efficiency bottleneck, we propose an end-to-end framework for generating structured clinical notes directly from doctor–patient dialogues. Our contributions are threefold: (1) We introduce CliniKnote—the first high-quality, human-annotated dialogue–note paired dataset; (2) We design K-SOAP, an enhanced structured format extending SOAP with a dedicated Keyword layer to support fine-grained clinical semantic modeling; (3) We develop a format-constrained decoding strategy coupled with medical expert-in-the-loop data curation, overcoming key limitations of standard LLM fine-tuning. Evaluated on real-world, complex clinical dialogues, our method achieves state-of-the-art performance across accuracy, completeness, and clinical utility metrics, while significantly improving generation efficiency over baselines.
📝 Abstract
Writing clinical notes and documenting medical exams is a critical task for healthcare professionals, serving as a vital component of patient care documentation. However, manually writing these notes is time-consuming and can impact the amount of time clinicians can spend on direct patient interaction and other tasks. Consequently, the development of automated clinical note generation systems has emerged as a clinically meaningful area of research within AI for health. In this paper, we present three key contributions to the field of clinical note generation using large language models (LLMs). First, we introduce CliniKnote, a comprehensive dataset consisting of 1,200 complex doctor-patient conversations paired with their full clinical notes. This dataset, created and curated by medical experts with the help of modern neural networks, provides a valuable resource for training and evaluating models in clinical note generation tasks. Second, we propose the K-SOAP (Keyword, Subjective, Objective, Assessment, and Plan) note format, which enhances traditional SOAP~cite{podder2023soap} (Subjective, Objective, Assessment, and Plan) notes by adding a keyword section at the top, allowing for quick identification of essential information. Third, we develop an automatic pipeline to generate K-SOAP notes from doctor-patient conversations and benchmark various modern LLMs using various metrics. Our results demonstrate significant improvements in efficiency and performance compared to standard LLM finetuning methods.