Guideline-grounded retrieval-augmented generation for ophthalmic clinical decision support

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing ophthalmic clinical decision support systems lack guideline-grounded, traceable, and precise question-answering capabilities. To address this limitation, this work proposes Oph-Guid-RAG, a multimodal visual Retrieval-Augmented Generation (RAG) system that uniquely treats guideline page images directly as evidence units during retrieval, thereby preserving their original layout information. The framework integrates query decomposition and rewriting, controllable retrieval with routing and filtering mechanisms, and multimodal reasoning to substantially enhance evidence traceability and system robustness. Evaluated on the HealthBench hard subset, Oph-Guid-RAG achieves a 30.0% improvement in overall score and a 10.4% gain in accuracy over GPT-5.2, and further outperforms GPT-5.4 by 24.4% in accuracy, significantly surpassing current baselines.

Technology Category

Application Category

📝 Abstract
In this work, we propose Oph-Guid-RAG, a multimodal visual RAG system for ophthalmology clinical question answering and decision support. We treat each guideline page as an independent evidence unit and directly retrieve page images, preserving tables, flowcharts, and layout information. We further design a controllable retrieval framework with routing and filtering, which selectively introduces external evidence and reduces noise. The system integrates query decomposition, query rewriting, retrieval, reranking, and multimodal reasoning, and provides traceable outputs with guideline page references. We evaluate our method on HealthBench using a doctor-based scoring protocol. On the hard subset, our approach improves the overall score from 0.2969 to 0.3861 (+0.0892, +30.0%) compared to GPT-5.2, and achieves higher accuracy, improving from 0.5956 to 0.6576 (+0.0620, +10.4%). Compared to GPT-5.4, our method achieves a larger accuracy gain of +0.1289 (+24.4%). These results show that our method is more effective on challenging cases that require precise, evidence-based reasoning. Ablation studies further show that reranking, routing, and retrieval design are critical for stable performance, especially under difficult settings. Overall, we show how combining visionbased retrieval with controllable reasoning can improve evidence grounding and robustness in clinical AI applications,while pointing out that further work is needed to be more complete.
Problem

Research questions and friction points this paper is trying to address.

ophthalmology
clinical decision support
retrieval-augmented generation
guideline grounding
multimodal reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented generation
multimodal reasoning
visual retrieval
clinical decision support
evidence grounding
S
Shuying Chen
University of International Business and Economics, No. 10 Huixin East Street, Chaoyang District, Beijing, P.R. China 100029
Sen Cui
Sen Cui
Tsinghua Universitty
trust LLMAI Agentembodied intelligence
Zhong Cao
Zhong Cao
University of Michigan
Autonomous VehicleReinforcement Learning