🤖 AI Summary
This work addresses the challenge of hallucinations in clinical summaries generated by large language models, which often contain claims unsupported by electronic health records (EHRs), while existing alignment methods risk omitting critical information. The authors propose the first integration of claim verification into a Direct Preference Optimization (DPO) framework, leveraging retrieval-augmented validation and single-token annotations to label claim–evidence relationships. They construct contradiction-aware, length-controlled preference pairs that enable coverage-aware preference mining, effectively suppressing hallucinations without succumbing to the “less-is-better” degradation. Evaluated on the MIMIC-III dataset, the method reduces the rate of unsupported claims from 10.7% to 1.9% using a local verifier and from 11.6% to 6.4% with GPT-4o, while achieving a summary validity of 82.5% and preserving informational completeness.
📝 Abstract
Brief Hospital Course (BHC) narratives must be clinically useful yet faithful to fragmented EHR evidence. LLM-based clinical summarizers still introduce unsupported statements, and alignment can encourage omissions ("say-less"degeneration). We introduce VERI-DPO, which uses claim verification to mine preferences and distill them into the summarizer with Direct Preference Optimization (DPO). On MIMIC-III-Ext-VeriFact-BHC (100 ICU patients; patient-level splits), we train a retrieval-augmented verifier to label claim-evidence pairs as Supported, Not Supported, or Not Addressed via a single-token format. The verifier scores sentence-level claims from sampled BHC candidates and aggregates margins into a coverage-aware utility to mine length-controlled, contradiction-anchored preference pairs. On held-out patients, verifier-mined preferences separate candidates by contradiction density, and VERI-DPO reduces Not Supported claim rates from 10.7% to 1.9% (local verifier judge) and from 11.6% to 6.4% (GPT-4o judge), while improving validity from 76.7% to 82.5% and maintaining informative length.