EyePCR: A Comprehensive Benchmark for Fine-Grained Perception, Knowledge Comprehension and Clinical Reasoning in Ophthalmic Surgery

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of multimodal large language models (MLLMs) in high-stakes medical applications—particularly weak fine-grained visual perception, shallow medical knowledge comprehension, and insufficient clinical reasoning in ophthalmic surgery—this paper introduces EyePCR, the first multi-level benchmark for surgical cognitive assessment. EyePCR evaluates three core competencies: visual perception, knowledge understanding, and clinical reasoning, incorporating fine-grained attribute annotations, a large-scale ophthalmic knowledge graph, and clinically grounded reasoning tasks. Building upon EyePCR, we propose EyePCR-MLLM, a domain-specific model integrating structured knowledge modeling, vision-language question generation, knowledge graph augmentation, and domain-adaptive training. Experimental results demonstrate that EyePCR-MLLM outperforms leading open-source MLLMs on perception-oriented multiple-choice questions and achieves performance on par with GPT-4.1 on knowledge understanding and clinical reasoning tasks, substantially enhancing the cognitive reliability and clinical applicability of surgical video analysis.

Technology Category

Application Category

📝 Abstract
MLLMs (Multimodal Large Language Models) have showcased remarkable capabilities, but their performance in high-stakes, domain-specific scenarios like surgical settings, remains largely under-explored. To address this gap, we develop extbf{EyePCR}, a large-scale benchmark for ophthalmic surgery analysis, grounded in structured clinical knowledge to evaluate cognition across extit{Perception}, extit{Comprehension} and extit{Reasoning}. EyePCR offers a richly annotated corpus with more than 210k VQAs, which cover 1048 fine-grained attributes for multi-view perception, medical knowledge graph of more than 25k triplets for comprehension, and four clinically grounded reasoning tasks. The rich annotations facilitate in-depth cognitive analysis, simulating how surgeons perceive visual cues and combine them with domain knowledge to make decisions, thus greatly improving models' cognitive ability. In particular, extbf{EyePCR-MLLM}, a domain-adapted variant of Qwen2.5-VL-7B, achieves the highest accuracy on MCQs for extit{Perception} among compared models and outperforms open-source models in extit{Comprehension} and extit{Reasoning}, rivalling commercial models like GPT-4.1. EyePCR reveals the limitations of existing MLLMs in surgical cognition and lays the foundation for benchmarking and enhancing clinical reliability of surgical video understanding models.
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs in high-stakes ophthalmic surgical scenarios
Assessing fine-grained perception, knowledge comprehension, and clinical reasoning
Addressing limitations in surgical cognition through structured clinical benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed EyePCR benchmark for surgical cognition evaluation
Integrated medical knowledge graph with clinical reasoning tasks
Created domain-adapted MLLM variant for ophthalmic analysis
Gui Wang
Gui Wang
School of Computer Science & Software Engineering, Shenzhen University, Shenzhen, China
Y
Yang Wennuo
School of Computer Science & Software Engineering, Shenzhen University, Shenzhen, China
X
Xusen Ma
School of Computer Science & Software Engineering, Shenzhen University, Shenzhen, China
Z
Zehao Zhong
School of Computer Science & Software Engineering, Shenzhen University, Shenzhen, China
Zhuoru Wu
Zhuoru Wu
School of Computer Science & Software Engineering, Shenzhen University, Shenzhen, China
E
Ende Wu
Wenzhou Medical University, Wenzhou, China
Rong Qu
Rong Qu
University of Nottingham
Hyper-heuristicsVehicle RoutingAutomated Algorithm DesignCombinatorial Optimisation
W
Wooi Ping Cheah
School of Computer Science, University of Nottingham Ningbo China, Ningbo, China
Jianfeng Ren
Jianfeng Ren
University of Nottingham Ningbo China
Computer VisionPattern RecognitionMachine LearningHuman-Computer Interaction
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis