CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Medical visual language models (VLMs) suffer from hallucination in chest X-ray analysis, undermining clinical reliability. Existing reinforcement learning approaches (e.g., GRPO) rely on sparse outcome-based rewards, yielding verbose, unverifiable reasoning that obscures factual errors. To address this, we propose a novel alignment paradigm—*process supervision over outcome supervision*: we structure reasoning into “disease–relation–anatomy” triplets, construct a fine-grained knowledge graph, and introduce the first *entity–relation matching consistency reward* mechanism. We further incorporate hard-sample mining and dual atomic-level constraints—logical coherence and factual accuracy. Evaluated on MIMIC-CXR-VQA, our method achieves state-of-the-art performance, surpassing prior approaches using only 5K training samples. It significantly reduces hallucination and redundancy, generating concise, verifiable, and clinically trustworthy chain-of-thought explanations.

Technology Category

Application Category

📝 Abstract

Medical Vision-Language Models (VLMs) are prone to hallucinations, compromising clinical reliability. While reinforcement learning methods like Group Relative Policy Optimization (GRPO) offer a low-cost alignment solution, their reliance on sparse, outcome-based rewards inadvertently encourages models to "overthink" -- generating verbose, convoluted, and unverifiable Chain-of-Thought reasoning to justify answers. This focus on outcomes obscures factual errors and poses significant safety risks. To address this, we propose CheXPO-v2, a novel alignment framework that shifts from outcome to process supervision. Our core innovation is a Knowledge Graph Consistency Reward mechanism driven by Entity-Relation Matching. By explicitly parsing reasoning steps into structured "Disease, Relation, Anatomy" triplets, we provide fine-grained supervision that penalizes incoherent logic and hallucinations at the atomic level. Integrating this with a hard-example mining strategy, our approach significantly outperforms GRPO and state-of-the-art models on benchmarks like MIMIC-CXR-VQA. Crucially, CheXPO-v2 achieves new state-of-the-art accuracy using only 5k samples, demonstrating exceptional data efficiency while producing clinically sound and verifiable reasoning. The project source code is publicly available at: https://github.com/ecoxial2007/CheX-Phi4MM.

Problem

Research questions and friction points this paper is trying to address.

Medical VLMs produce unreliable hallucinations in clinical applications

Outcome-based RL alignment encourages unverifiable verbose reasoning chains

Current methods obscure factual errors creating patient safety risks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Knowledge Graph Consistency Reward for process supervision

Parses reasoning into structured Disease-Relation-Anatomy triplets

Combines fine-grained supervision with hard-example mining strategy

🔎 Similar Papers

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training