Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

This work proposes a modular retrieval-augmented generation (RAG) approach to alleviate clinicians’ documentation burden by efficiently and accurately transforming nurse–patient conversations into structured clinical observations conforming to predefined schemas. The method innovatively integrates schema-constrained prompting—supporting both full and pruned schema variants—with a RAG framework and a two-stage verification strategy, leveraging large language models (Llama-4-Scout-17B-16E-Instruct and GPT-5.2), specialized embedding models, and deterministic post-processing. Evaluated on the MEDIQA-SYNUR task, the system achieves an F1 score of 80.36%, demonstrating RAG’s consistent performance gains and the effectiveness of secondary review in correcting schema compliance errors. The results also reveal a dependency between schema constraint strength and model performance.

📝 Abstract

Conversational nurse-patient transcripts contain actionable observations, but converting these transcripts into structured representations at scale remains challenging. Documentation burden is substantial, with prior studies showing clinicians spend large portions of their workday on documentation and related desk work rather than direct patient care. MEDIQA-SYNUR focuses on observation extraction from conversational nurse-patient transcripts, requiring systems to normalize these narratives into a predefined schema with value-type constraints. We propose a modular retrieval-augmented generation (RAG) pipeline that uses the training set as an exemplar corpus, combines schema-constrained prompting (full schema vs. pruned candidate schema), deterministic schema-based postprocessing, and a second-pass audit, with two LLM backbones: Llama-4-Scout-17B-16E-Instruct and GPT-5.2 with corresponding embedding models for RAG. Our best configuration uses GPT-5.2 with full schema, RAG, and a second-pass auditing, achieving 80.36% F1 score. Overall, our results show that RAG consistently improves performance, while the optimal degree of schema constraint depends on the model, and second-pass auditing yields modest additional gains by correcting residual schema-adherence errors.

Problem

Research questions and friction points this paper is trying to address.

clinical information extraction

schema-constrained normalization

conversational transcripts

structured representation

documentation burden

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation (RAG)

Schema-Constrained Extraction

Clinical Information Extraction