3DReasonKnee: Advancing Grounded Reasoning in Medical Vision Language Models

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Current 3D vision-language models lack anatomically grounded, clinically aligned stepwise reasoning capabilities, hindering their trustworthy collaboration in real-world diagnostic settings. To address this, we introduce 3DReasonKnee, the first dataset enabling grounded reasoning over 3D medical images—comprising 7,970 knee MRI scans annotated by clinical experts with 3D bounding boxes, diagnostic questions, multi-step clinical reasoning chains, and structured severity assessments, yielding 494k high-quality five-tuples. Based on this, we establish ReasonKnee-Bench, the first clinically aligned 3D medical vision-language modeling benchmark. This work pioneers the integration of physician-guided 3D spatial reasoning and structured diagnostic evaluation into multimodal medical AI. Empirical results demonstrate significant improvements in anatomical localization accuracy, causal reasoning fidelity, and clinical decision consistency. Our contribution provides both foundational data and a rigorous evaluation framework for interpretable, trustworthy orthopedic AI diagnosis.

Technology Category

Application Category

📝 Abstract

Current Vision-Language Models (VLMs) struggle to ground anatomical regions in 3D medical images and reason about them in a step-by-step manner, a key requirement of real-world diagnostic assessment. This ability is essential for aligning model outputs with the diagnostic workflows clinicians use in practice, enabling trustworthy clinician-AI collaboration. Existing 3D datasets provide localization labels, but none support this "grounded reasoning" ability. To address this gap, we introduce 3DReasonKnee, the first 3D grounded reasoning dataset for medical images, which provides 494k high-quality quintuples derived from 7,970 3D knee MRI volumes. Each quintuple includes: (1) the 3D MRI volume, (2) a diagnostic question targeting a specific anatomical region (3) a 3D bounding box localizing the relevant anatomical structures, (4) clinician-generated diagnostic reasoning steps that explicitly detail the 3D reasoning process, and (5) structured severity assessments for the relevant anatomical region. The creation and validation of 3DReasonKnee, involving over 450 hours of expert clinician time for manually segmenting MRIs and generating reasoning chains, ensures its superior quality and clinical relevance. We establish ReasonKnee-Bench to evaluate localization and diagnostic accuracy, providing insight into VLM ability to perform grounding and severity assessment across anatomical regions and diagnostic inquiries. We benchmark five state-of-the-art VLMs, providing baseline performance for ReasonKnee-Bench. By providing this unique resource of expert-annotated 3D reasoning pathways, 3DReasonKnee serves as a repository of orthopedic surgeons' diagnostic expertise and offers a vital testbed for advancing multimodal medical AI systems towards 3D, clinically aligned, localized decision-making capabilities. The dataset can be found in: https://huggingface.co/datasets/rajpurkarlab/3DReasonKnee

Problem

Research questions and friction points this paper is trying to address.

Addressing VLMs' inability to ground anatomical regions in 3D medical images

Providing structured reasoning steps for clinical diagnostic workflows

Establishing benchmarks for 3D localization and severity assessment accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces 3DReasonKnee dataset for medical grounded reasoning

Provides clinician-generated diagnostic reasoning steps in 3D

Establishes benchmark for localization and diagnostic accuracy evaluation

🔎 Similar Papers

MedRG: Medical Report Grounding with Multi-modal Large Language Model

2024-04-10arXiv.orgCitations: 5

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

2024-02-09European Conference on Computer VisionCitations: 29