X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the lack of effective evaluation frameworks for progressive reasoning and cross-modal integration in existing medical multimodal large language models, particularly in ophthalmic diagnosis. To bridge this gap, we introduce OphthalBench—the first benchmark tailored to the complete ophthalmic diagnostic workflow—encompassing a six-stage progressive reasoning chain and six categories of multimodal fusion tasks across 52 eye diseases. Built upon 26,415 images and 177,868 expert-validated visual question-answer pairs aggregated from 51 public datasets, OphthalBench establishes a clinically grounded framework for assessing cross-modal progressive reasoning. Systematic evaluation of 21 state-of-the-art multimodal large models reveals significant deficiencies in complex diagnostic reasoning and multimodal coordination, thereby offering clear guidance for future model refinement.

Technology Category

Application Category

📝 Abstract
Despite significant progress in Multi-modal Large Language Models (MLLMs), their clinical reasoning capacity for multi-modal diagnosis remains largely unexamined. Current benchmarks, mostly single-modality data, can't evaluate progressive reasoning and cross-modal integration essential for clinical practice. We introduce the Cross-Modality Progressive Clinical Reasoning (X-PCR) benchmark, the first comprehensive evaluation of MLLMs through a complete ophthalmology diagnostic workflow, with two reasoning tasks: 1) a six-stage progressive reasoning chain spanning image quality assessment to clinical decision-making, and 2) a cross-modality reasoning task integrating six imaging modalities. The benchmark comprises 26,415 images and 177,868 expert-verified VQA pairs curated from 51 public datasets, covering 52 ophthalmic diseases. Evaluation of 21 MLLMs reveals critical gaps in progressive reasoning and cross-modal integration. Dataset and code: https://github.com/CVI-SZU/X-PCR.
Problem

Research questions and friction points this paper is trying to address.

clinical reasoning
cross-modality
progressive reasoning
ophthalmic diagnosis
multi-modal large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modality Reasoning
Progressive Clinical Reasoning
Multi-modal Large Language Models
Ophthalmic Diagnosis
Visual Question Answering