CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical visual language models (VLMs) suffer from hallucination in chest X-ray analysis, undermining clinical reliability. Existing reinforcement learning approaches (e.g., GRPO) rely on sparse outcome-based rewards, yielding verbose, unverifiable reasoning that obscures factual errors. To address this, we propose a novel alignment paradigm—*process supervision over outcome supervision*: we structure reasoning into “disease–relation–anatomy” triplets, construct a fine-grained knowledge graph, and introduce the first *entity–relation matching consistency reward* mechanism. We further incorporate hard-sample mining and dual atomic-level constraints—logical coherence and factual accuracy. Evaluated on MIMIC-CXR-VQA, our method achieves state-of-the-art performance, surpassing prior approaches using only 5K training samples. It significantly reduces hallucination and redundancy, generating concise, verifiable, and clinically trustworthy chain-of-thought explanations.

Technology Category

Application Category

📝 Abstract
Medical Vision-Language Models (VLMs) are prone to hallucinations, compromising clinical reliability. While reinforcement learning methods like Group Relative Policy Optimization (GRPO) offer a low-cost alignment solution, their reliance on sparse, outcome-based rewards inadvertently encourages models to "overthink" -- generating verbose, convoluted, and unverifiable Chain-of-Thought reasoning to justify answers. This focus on outcomes obscures factual errors and poses significant safety risks. To address this, we propose CheXPO-v2, a novel alignment framework that shifts from outcome to process supervision. Our core innovation is a Knowledge Graph Consistency Reward mechanism driven by Entity-Relation Matching. By explicitly parsing reasoning steps into structured "Disease, Relation, Anatomy" triplets, we provide fine-grained supervision that penalizes incoherent logic and hallucinations at the atomic level. Integrating this with a hard-example mining strategy, our approach significantly outperforms GRPO and state-of-the-art models on benchmarks like MIMIC-CXR-VQA. Crucially, CheXPO-v2 achieves new state-of-the-art accuracy using only 5k samples, demonstrating exceptional data efficiency while producing clinically sound and verifiable reasoning. The project source code is publicly available at: https://github.com/ecoxial2007/CheX-Phi4MM.
Problem

Research questions and friction points this paper is trying to address.

Medical VLMs produce unreliable hallucinations in clinical applications
Outcome-based RL alignment encourages unverifiable verbose reasoning chains
Current methods obscure factual errors creating patient safety risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Knowledge Graph Consistency Reward for process supervision
Parses reasoning into structured Disease-Relation-Anatomy triplets
Combines fine-grained supervision with hard-example mining strategy
X
Xiao Liang
School of Computer Science and Technology, Xidian University, Xi’an, China
Y
Yuxuan An
School of Computer Science and Technology, Xidian University, Xi’an, China
D
Di Wang
School of Computer Science and Technology, Xidian University, Xi’an, China
Jiawei Hu
Jiawei Hu
PhD Student, University of New South Wales
Mobile ComputingUbiquitous Computing
Zhicheng Jiao
Zhicheng Jiao
Brown University Health, Warren Alpert Medical School of Brown University
Medical image analysisHealth informatics
Bin Jing
Bin Jing
School of Biomedical Engineering, Capital Medical University, Beijing, China
Q
Quan Wang
School of Computer Science and Technology, Xidian University, Xi’an, China