VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge of hallucinations in clinical summaries generated by large language models, which often contain claims unsupported by electronic health records (EHRs), while existing alignment methods risk omitting critical information. The authors propose the first integration of claim verification into a Direct Preference Optimization (DPO) framework, leveraging retrieval-augmented validation and single-token annotations to label claim–evidence relationships. They construct contradiction-aware, length-controlled preference pairs that enable coverage-aware preference mining, effectively suppressing hallucinations without succumbing to the “less-is-better” degradation. Evaluated on the MIMIC-III dataset, the method reduces the rate of unsupported claims from 10.7% to 1.9% using a local verifier and from 11.6% to 6.4% with GPT-4o, while achieving a summary validity of 82.5% and preserving informational completeness.

Technology Category

Application Category

📝 Abstract

Brief Hospital Course (BHC) narratives must be clinically useful yet faithful to fragmented EHR evidence. LLM-based clinical summarizers still introduce unsupported statements, and alignment can encourage omissions ("say-less"degeneration). We introduce VERI-DPO, which uses claim verification to mine preferences and distill them into the summarizer with Direct Preference Optimization (DPO). On MIMIC-III-Ext-VeriFact-BHC (100 ICU patients; patient-level splits), we train a retrieval-augmented verifier to label claim-evidence pairs as Supported, Not Supported, or Not Addressed via a single-token format. The verifier scores sentence-level claims from sampled BHC candidates and aggregates margins into a coverage-aware utility to mine length-controlled, contradiction-anchored preference pairs. On held-out patients, verifier-mined preferences separate candidates by contradiction density, and VERI-DPO reduces Not Supported claim rates from 10.7% to 1.9% (local verifier judge) and from 11.6% to 6.4% (GPT-4o judge), while improving validity from 76.7% to 82.5% and maintaining informative length.

Problem

Research questions and friction points this paper is trying to address.

clinical summarization

evidence faithfulness

unsupported statements

say-less degeneration

claim verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

claim verification

Direct Preference Optimization

evidence-aware alignment