EyePCR: A Comprehensive Benchmark for Fine-Grained Perception, Knowledge Comprehension and Clinical Reasoning in Ophthalmic Surgery

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the limitations of multimodal large language models (MLLMs) in high-stakes medical applications—particularly weak fine-grained visual perception, shallow medical knowledge comprehension, and insufficient clinical reasoning in ophthalmic surgery—this paper introduces EyePCR, the first multi-level benchmark for surgical cognitive assessment. EyePCR evaluates three core competencies: visual perception, knowledge understanding, and clinical reasoning, incorporating fine-grained attribute annotations, a large-scale ophthalmic knowledge graph, and clinically grounded reasoning tasks. Building upon EyePCR, we propose EyePCR-MLLM, a domain-specific model integrating structured knowledge modeling, vision-language question generation, knowledge graph augmentation, and domain-adaptive training. Experimental results demonstrate that EyePCR-MLLM outperforms leading open-source MLLMs on perception-oriented multiple-choice questions and achieves performance on par with GPT-4.1 on knowledge understanding and clinical reasoning tasks, substantially enhancing the cognitive reliability and clinical applicability of surgical video analysis.

Technology Category

Application Category

📝 Abstract

MLLMs (Multimodal Large Language Models) have showcased remarkable capabilities, but their performance in high-stakes, domain-specific scenarios like surgical settings, remains largely under-explored. To address this gap, we develop extbf{EyePCR}, a large-scale benchmark for ophthalmic surgery analysis, grounded in structured clinical knowledge to evaluate cognition across extit{Perception}, extit{Comprehension} and extit{Reasoning}. EyePCR offers a richly annotated corpus with more than 210k VQAs, which cover 1048 fine-grained attributes for multi-view perception, medical knowledge graph of more than 25k triplets for comprehension, and four clinically grounded reasoning tasks. The rich annotations facilitate in-depth cognitive analysis, simulating how surgeons perceive visual cues and combine them with domain knowledge to make decisions, thus greatly improving models' cognitive ability. In particular, extbf{EyePCR-MLLM}, a domain-adapted variant of Qwen2.5-VL-7B, achieves the highest accuracy on MCQs for extit{Perception} among compared models and outperforms open-source models in extit{Comprehension} and extit{Reasoning}, rivalling commercial models like GPT-4.1. EyePCR reveals the limitations of existing MLLMs in surgical cognition and lays the foundation for benchmarking and enhancing clinical reliability of surgical video understanding models.

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs in high-stakes ophthalmic surgical scenarios

Assessing fine-grained perception, knowledge comprehension, and clinical reasoning

Addressing limitations in surgical cognition through structured clinical benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed EyePCR benchmark for surgical cognition evaluation

Integrated medical knowledge graph with clinical reasoning tasks

Created domain-adapted MLLM variant for ophthalmic analysis

🔎 Similar Papers

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models