🤖 AI Summary
Addressing two key challenges in clinical text generation—fragmented, heterogeneous patient data and information loss due to document verbosity—this paper proposes a retrieval-augmented structured generation framework. Methodologically, it introduces a hierarchical chunking strategy coupled with a two-stage retrieval process: global evidence-driven retrieval for coarse-grained relevance and local high-value content extraction for fine-grained temporal coherence, jointly ensuring semantic fidelity and chronological consistency across document and paragraph levels. Leveraging large language models (LLMs) in zero-shot or few-shot settings, the framework generates structured clinical summaries from multi-source notes in MIMIC-III. Experimental results on hospital progress note generation show an average alignment score of 87.7%, significantly surpassing the physician baseline (80.7%). The approach demonstrates high cross-model consistency, strong reproducibility, and enhanced clinical credibility.
📝 Abstract
Large language models (LLMs), including zero-shot and few-shot paradigms, have shown promising capabilities in clinical text generation. However, real-world applications face two key challenges: (1) patient data is highly unstructured, heterogeneous, and scattered across multiple note types and (2) clinical notes are often long and semantically dense, making naive prompting infeasible due to context length constraints and the risk of omitting clinically relevant information.
We introduce CLI-RAG (Clinically Informed Retrieval-Augmented Generation), a domain-specific framework for structured and clinically grounded text generation using LLMs. It incorporates a novel hierarchical chunking strategy that respects clinical document structure and introduces a task-specific dual-stage retrieval mechanism. The global stage identifies relevant note types using evidence-based queries, while the local stage extracts high-value content within those notes creating relevance at both document and section levels.
We apply the system to generate structured progress notes for individual hospital visits using 15 clinical note types from the MIMIC-III dataset. Experiments show that it preserves temporal and semantic alignment across visits, achieving an average alignment score of 87.7%, surpassing the 80.7% baseline from real clinician-authored notes. The generated outputs also demonstrate high consistency across LLMs, reinforcing deterministic behavior essential for reproducibility, reliability, and clinical trust.