CataractSurg-80K: Knowledge-Driven Benchmarking for Structured Reasoning in Ophthalmic Surgery Planning

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing large language models (LLMs) lack ophthalmology-specific knowledge, struggle to integrate heterogeneous clinical data, and fail to generate executable cataract surgical plans. Method: We propose a knowledge-driven multi-agent reasoning framework. Specifically, we construct CataractSurg-80K—the first large-scale benchmark for cataract surgery planning—curated with expert-annotated chain-of-thought (CoT) rationales and a multi-stage fine-tuning strategy. Based on Qwen-4B, we develop Qwen-CSP, a domain-specialized model incorporating a multi-agent collaborative parsing mechanism to enable end-to-end generation of structured surgical plans from unstructured clinical reports. Contribution/Results: Qwen-CSP significantly outperforms general-purpose LLMs across multiple evaluation metrics. The released CataractSurg-80K dataset and associated evaluation benchmark establish a high-quality resource and a new standard for interpretable, AI-assisted clinical decision-making in medicine.

Technology Category

Application Category

📝 Abstract

Cataract surgery remains one of the most widely performed and effective procedures for vision restoration. Effective surgical planning requires integrating diverse clinical examinations for patient assessment, intraocular lens (IOL) selection, and risk evaluation. Large language models (LLMs) have shown promise in supporting clinical decision-making. However, existing LLMs often lack the domain-specific expertise to interpret heterogeneous ophthalmic data and provide actionable surgical plans. To enhance the model's ability to interpret heterogeneous ophthalmic reports, we propose a knowledge-driven Multi-Agent System (MAS), where each agent simulates the reasoning process of specialist ophthalmologists, converting raw clinical inputs into structured, actionable summaries in both training and deployment stages. Building on MAS, we introduce CataractSurg-80K, the first large-scale benchmark for cataract surgery planning that incorporates structured clinical reasoning. Each case is annotated with diagnostic questions, expert reasoning chains, and structured surgical recommendations. We further introduce Qwen-CSP, a domain-specialized model built on Qwen-4B, fine-tuned through a multi-stage process tailored for surgical planning. Comprehensive experiments show that Qwen-CSP outperforms strong general-purpose LLMs across multiple metrics. Our work delivers a high-quality dataset, a rigorous benchmark, and a domain-adapted LLM to facilitate future research in medical AI reasoning and decision support.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' interpretation of heterogeneous ophthalmic reports

Providing structured surgical recommendations for cataract procedures

Addressing domain-specific expertise gap in clinical decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-driven Multi-Agent System for ophthalmic data interpretation

CataractSurg-80K benchmark with structured clinical reasoning annotations

Qwen-CSP domain-specialized model fine-tuned for surgical planning

🔎 Similar Papers

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models