🤖 AI Summary
AI models for cataract surgery are hindered by coarse annotations, limited procedural diversity, and poor cross-center generalizability. Method: We introduce the first large-scale, multi-center cataract surgery video dataset (3,000 cases) with a novel four-tier annotation scheme: (i) surgical phase temporal segmentation, (ii) instance-level segmentation of instruments and anatomical structures, (iii) fine-grained instrument–tissue interaction tracking, and (iv) quantitative skill scoring. Based on this, we propose a multi-task joint learning framework integrated with cross-center domain adaptation, jointly modeling video temporal dynamics, interaction relational reasoning, and skill regression. Contribution/Results: Our approach establishes new state-of-the-art benchmarks across three core tasks—phase recognition, scene segmentation, and skill assessment—demonstrating substantial improvements in cross-center generalization. All data and code are publicly released to accelerate the development of clinically deployable surgical AI systems.
📝 Abstract
The development of computer-assisted surgery systems depends on large-scale, annotated datasets. Current resources for cataract surgery often lack the diversity and annotation depth needed to train generalizable deep-learning models. To address this gap, we present a dataset of 3,000 phacoemulsification cataract surgery videos from two surgical centers, performed by surgeons with a range of experience levels. This resource is enriched with four annotation layers: temporal surgical phases, instance segmentation of instruments and anatomical structures, instrument-tissue interaction tracking, and quantitative skill scores based on the established competency rubrics like the ICO-OSCAR. The technical quality of the dataset is supported by a series of benchmarking experiments for key surgical AI tasks, including workflow recognition, scene segmentation, and automated skill assessment. Furthermore, we establish a domain adaptation baseline for the phase recognition task by training a model on a subset of surgical centers and evaluating its performance on a held-out center. The dataset and annotations are available in Google Form (https://docs.google.com/forms/d/e/1FAIpQLSfmyMAPSTGrIy2sTnz0-TMw08ZagTimRulbAQcWdaPwDy187A/viewform?usp=dialog).