Reasoning Visual Language Model for Chest X-Ray Analysis

๐Ÿ“… 2025-10-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current vision-language models (VLMs) for chest X-ray analysis lack interpretability, hindering clinical audit and human-AI collaboration. To address this, we propose the first reasoning-first VLM framework tailored to thoracic radiology diagnosis. It explicitly models radiologistsโ€™ systematic differential diagnostic process via chain-of-thought prompting, generating traceable, uncertainty-aware reasoning paths with alternative hypotheses. Methodologically, the framework integrates high-fidelity visual encoding, two-stage supervised fine-tuning, and verifiability-guided reinforcement learning to jointly model multiple abnormalities. Experiments demonstrate competitive performance on multi-label classification tasks. Radiologist evaluations confirm that the generated reasoning trajectories significantly enhance diagnostic confidence, accelerate report generation, and enable error tracing and decision audit. This work establishes a novel paradigm for trustworthy AI-assisted diagnosis in medical imaging.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision-language models (VLMs) have shown strong promise for medical image analysis, but most remain opaque, offering predictions without the transparent, stepwise reasoning clinicians rely on. We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation. Inspired by reasoning-first training paradigms, our approach is designed to learn how experts reason, not just what they conclude, by aligning intermediate steps with observable image evidence and radiology workflow. Beyond accuracy, the explicit reasoning traces support clinical auditability: they reveal why a conclusion was reached, which alternatives were considered, and where uncertainty remains, enabling quality assurance, error analysis, and safer human-AI collaboration. Our model couples high-fidelity visual encoding with a two-stage training recipe: a reasoning-style supervised fine-tuning (SFT) followed by reinforcement learning (RL) that uses verifiable rewards over a list of X-ray abnormalities. The model outputs reasoning that mirrors radiologists systematic thought process, uncertainty, and differential diagnosis. In out-of-distribution evaluation, the approach achieves competitive multi-label classification while improving interpretability. In a reader study with expert radiologists, full reasoning traces increased confidence, supported error auditing, and reduced time to finalize reports. We release code and the model NV-Reason-CXR-3B to support community progress toward trustworthy, explainable AI in chest radiography and other medical imaging tasks where reasoning quality is as critical as prediction quality.
Problem

Research questions and friction points this paper is trying to address.

Enhances transparency in medical image analysis through reasoning
Aligns AI reasoning steps with radiology workflow and evidence
Supports clinical auditability and safer human-AI collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-thought reasoning for chest X-ray interpretation
Two-stage training with SFT and verifiable RL rewards
Explicit reasoning traces enabling clinical auditability
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Andriy Myronenko
NVIDIA, Santa Clara, CA
D
Dong Yang
NVIDIA, Santa Clara, CA
Baris Turkbey
Baris Turkbey
National Cancer Institute, National Institutes of Health
Prostate cancerfocal therapyartificial intelligencemolecular imagingPSMA
M
Mariam Aboian
CHOP/UPenn, Philadelphia, PA
S
Sena Azamat
Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey
E
Esra Akcicek
Lunenfeld-Tanenbaum Research Institute, Toronto, Canada
H
Hongxu Yin
NVIDIA, Santa Clara, CA
Pavlo Molchanov
Pavlo Molchanov
NVIDIA Research
AIMachine LearningEfficient Deep LearningSemi-supervised learningnetwork inversion
M
Marc Edgar
NVIDIA, Santa Clara, CA
Yufan He
Yufan He
NVidia
medical image analysis
P
Pengfei Guo
NVIDIA, Santa Clara, CA
Yucheng Tang
Yucheng Tang
Sr. Research Scientist at NVIDIA
3D Computer VisionVision-Language ModelHealthcare AIAccelerated Computing
Daguang Xu
Daguang Xu
Senior Research Manager at NVIDIA
Deep LearningMachine LearningMedical Image AnalysisCompressive SensingSparse coding