MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current medical visual question answering (VQA) models trained via reinforcement learning with pure policies tend to reinforce superficially coherent yet clinically inaccurate reasoning paths, failing to emulate physicians’ progressive focusing and iterative diagnostic processes. To address this, we propose the Gaze-guided Reasoning Navigator (GRN) within a dual-stream Generalized Reinforcement Policy Optimization (GRPO) framework, integrating eye-tracking trajectory guidance with autonomous exploration to enable dynamic visual region focusing and interpretable stepwise reasoning. Methodologically, GRN incorporates off-policy expert guidance, adaptive termination, nucleus sampling, and a dual-modality exploration strategy—balancing expert imitation with model-driven discovery. Evaluated across multiple medical VQA benchmarks, GRN achieves an average accuracy gain of 8.5%, significantly improving clinical consistency and reasoning interpretability. This work establishes a novel paradigm for AI-assisted diagnosis systems aligned with real-world clinical reasoning logic.

Technology Category

Application Category

📝 Abstract
Accurate medical diagnosis often involves progressive visual focusing and iterative reasoning, characteristics commonly observed in clinical workflows. While recent vision-language models demonstrate promising chain-of-thought (CoT) reasoning capabilities via reinforcement learning with verifiable rewards (RLVR), their purely on-policy learning paradigm tends to reinforce superficially coherent but clinically inaccurate reasoning paths. We propose MedEyes, a novel reinforcement learning framework that dynamically models clinician-style diagnostic reasoning by progressively attending to and interpreting relevant medical image regions. By incorporating off-policy expert guidance, MedEyes converts expert visual search trajectories into structured external behavioral signals, guiding the model toward clinically aligned visual reasoning. We design the Gaze-guided Reasoning Navigator (GRN) to emulate the diagnostic process through a dual-mode exploration strategy, scanning for systematic abnormality localization and drilling for detailed regional analysis. To balance expert imitation and autonomous discovery, we introduce the Confidence Value Sampler (CVS), which employs nucleus sampling and adaptive termination to create diverse yet credible exploration paths. Finally, the dual-stream GRPO optimization framework decouples on-policy and off-policy learning signals, mitigating reward assimilation and entropy collapse. Experiments demonstrate that MedEyes achieves an average performance improvement of +8.5% across multiple medical VQA benchmarks, validating MedEyes's potential in building interpretable medical AI systems.
Problem

Research questions and friction points this paper is trying to address.

Improves medical AI by modeling clinician-style progressive visual reasoning.
Guides models with expert visual search to avoid inaccurate reasoning paths.
Enhances interpretability and accuracy in medical visual question answering.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-mode exploration strategy for systematic and detailed medical image analysis
Off-policy expert guidance converting visual trajectories into behavioral signals
Dual-stream optimization framework decoupling on-policy and off-policy learning
🔎 Similar Papers
No similar papers found.
C
Chunzheng Zhu
Hunan University
Y
Yangfang Lin
Hunan University
S
Shen Chen
Hunan University
Y
Yijun Wang
Hunan University
Jianxin Lin
Jianxin Lin
Associate Professor of Computer Science, Hunan University
Generative ModelsDeep LearningMedical Image Processing