VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of hallucinations in clinical summaries generated by large language models, which often contain claims unsupported by electronic health records (EHRs), while existing alignment methods risk omitting critical information. The authors propose the first integration of claim verification into a Direct Preference Optimization (DPO) framework, leveraging retrieval-augmented validation and single-token annotations to label claim–evidence relationships. They construct contradiction-aware, length-controlled preference pairs that enable coverage-aware preference mining, effectively suppressing hallucinations without succumbing to the “less-is-better” degradation. Evaluated on the MIMIC-III dataset, the method reduces the rate of unsupported claims from 10.7% to 1.9% using a local verifier and from 11.6% to 6.4% with GPT-4o, while achieving a summary validity of 82.5% and preserving informational completeness.

Technology Category

Application Category

📝 Abstract
Brief Hospital Course (BHC) narratives must be clinically useful yet faithful to fragmented EHR evidence. LLM-based clinical summarizers still introduce unsupported statements, and alignment can encourage omissions ("say-less"degeneration). We introduce VERI-DPO, which uses claim verification to mine preferences and distill them into the summarizer with Direct Preference Optimization (DPO). On MIMIC-III-Ext-VeriFact-BHC (100 ICU patients; patient-level splits), we train a retrieval-augmented verifier to label claim-evidence pairs as Supported, Not Supported, or Not Addressed via a single-token format. The verifier scores sentence-level claims from sampled BHC candidates and aggregates margins into a coverage-aware utility to mine length-controlled, contradiction-anchored preference pairs. On held-out patients, verifier-mined preferences separate candidates by contradiction density, and VERI-DPO reduces Not Supported claim rates from 10.7% to 1.9% (local verifier judge) and from 11.6% to 6.4% (GPT-4o judge), while improving validity from 76.7% to 82.5% and maintaining informative length.
Problem

Research questions and friction points this paper is trying to address.

clinical summarization
evidence faithfulness
unsupported statements
say-less degeneration
claim verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

claim verification
Direct Preference Optimization
evidence-aware alignment
clinical summarization
contradiction-anchored preferences
🔎 Similar Papers
No similar papers found.
Weixin Liu
Weixin Liu
Baidu Inc.
Natural Language ProcessingMachine LearningDeep Learning
C
Congning Ni
Vanderbilt University Medical Center, Nashville, TN, USA
Q
Qingyuan Song
Vanderbilt University, Nashville, TN, USA
S
Susannah L. Rose
Vanderbilt University Medical Center, Nashville, TN, USA
C
Christopher Symons
Lirio, Knoxville, TN, USA
Murat Kantarcioglu
Murat Kantarcioglu
Professor of Computer Science, Virginia Tech
Security and Privacy in AIDatabasesData ScienceComputer Security
B
Bradley A. Malin
Vanderbilt University, Nashville, TN, USA; Vanderbilt University Medical Center, Nashville, TN, USA
Z
Zhijun Yin
Vanderbilt University, Nashville, TN, USA; Vanderbilt University Medical Center, Nashville, TN, USA