Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis

📅 2025-10-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
AI models for cataract surgery are hindered by coarse annotations, limited procedural diversity, and poor cross-center generalizability. Method: We introduce the first large-scale, multi-center cataract surgery video dataset (3,000 cases) with a novel four-tier annotation scheme: (i) surgical phase temporal segmentation, (ii) instance-level segmentation of instruments and anatomical structures, (iii) fine-grained instrument–tissue interaction tracking, and (iv) quantitative skill scoring. Based on this, we propose a multi-task joint learning framework integrated with cross-center domain adaptation, jointly modeling video temporal dynamics, interaction relational reasoning, and skill regression. Contribution/Results: Our approach establishes new state-of-the-art benchmarks across three core tasks—phase recognition, scene segmentation, and skill assessment—demonstrating substantial improvements in cross-center generalization. All data and code are publicly released to accelerate the development of clinically deployable surgical AI systems.

Technology Category

Application Category

📝 Abstract
The development of computer-assisted surgery systems depends on large-scale, annotated datasets. Current resources for cataract surgery often lack the diversity and annotation depth needed to train generalizable deep-learning models. To address this gap, we present a dataset of 3,000 phacoemulsification cataract surgery videos from two surgical centers, performed by surgeons with a range of experience levels. This resource is enriched with four annotation layers: temporal surgical phases, instance segmentation of instruments and anatomical structures, instrument-tissue interaction tracking, and quantitative skill scores based on the established competency rubrics like the ICO-OSCAR. The technical quality of the dataset is supported by a series of benchmarking experiments for key surgical AI tasks, including workflow recognition, scene segmentation, and automated skill assessment. Furthermore, we establish a domain adaptation baseline for the phase recognition task by training a model on a subset of surgical centers and evaluating its performance on a held-out center. The dataset and annotations are available in Google Form (https://docs.google.com/forms/d/e/1FAIpQLSfmyMAPSTGrIy2sTnz0-TMw08ZagTimRulbAQcWdaPwDy187A/viewform?usp=dialog).
Problem

Research questions and friction points this paper is trying to address.

Addresses limited diversity in cataract surgery video datasets for AI training
Provides multi-layer annotations for surgical phases and instrument interactions
Establishes benchmarks for surgical workflow recognition and skill assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-source surgical video dataset with 3000 videos
Four annotation layers for surgical analysis tasks
Domain adaptation baseline for phase recognition
M
Mohammad Javad Ahmadi
Applied Robotics and AI Solutions (ARAS), Faculties of Electrical and Computer Engineering, K.N. Toosi University of Technology, Tehran, Iran.
I
Iman Gandomi
Applied Robotics and AI Solutions (ARAS), Faculties of Electrical and Computer Engineering, K.N. Toosi University of Technology, Tehran, Iran.
P
Parisa Abdi
Translational Ophthalmology Research Center, Farabi Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran.
Seyed-Farzad Mohammadi
Seyed-Farzad Mohammadi
Professor of Cornea, Anterior Segment & Refractive Surgery; & Senior Clinician Scientist
Ophthalmology: ServiceEducation & Translational ResearchPublic Eye HealthAcademic & Exec LeadershipEditorship
A
Amirhossein Taslimi
Applied Robotics and AI Solutions (ARAS), Faculties of Electrical and Computer Engineering, K.N. Toosi University of Technology, Tehran, Iran.
M
Mehdi Khodaparast
Translational Ophthalmology Research Center, Farabi Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran.
H
Hassan Hashemi
Noor Ophthalmology Research Center, Noor Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran.
M
Mahdi Tavakoli
Departments of Electrical and Computer Engineering & Biomedical Engineering, University of Alberta, Edmonton, AB, Canada.
Hamid D. Taghirad
Hamid D. Taghirad
Professor and Director of Applied Robotics and AI Solutions (ARAS)
RoboticsMedical RoboticsMachine LearningComputer VisionArtificial Intelligence