Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis

📅 2025-10-18

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

AI models for cataract surgery are hindered by coarse annotations, limited procedural diversity, and poor cross-center generalizability. Method: We introduce the first large-scale, multi-center cataract surgery video dataset (3,000 cases) with a novel four-tier annotation scheme: (i) surgical phase temporal segmentation, (ii) instance-level segmentation of instruments and anatomical structures, (iii) fine-grained instrument–tissue interaction tracking, and (iv) quantitative skill scoring. Based on this, we propose a multi-task joint learning framework integrated with cross-center domain adaptation, jointly modeling video temporal dynamics, interaction relational reasoning, and skill regression. Contribution/Results: Our approach establishes new state-of-the-art benchmarks across three core tasks—phase recognition, scene segmentation, and skill assessment—demonstrating substantial improvements in cross-center generalization. All data and code are publicly released to accelerate the development of clinically deployable surgical AI systems.

Technology Category

Application Category

📝 Abstract

The development of computer-assisted surgery systems depends on large-scale, annotated datasets. Current resources for cataract surgery often lack the diversity and annotation depth needed to train generalizable deep-learning models. To address this gap, we present a dataset of 3,000 phacoemulsification cataract surgery videos from two surgical centers, performed by surgeons with a range of experience levels. This resource is enriched with four annotation layers: temporal surgical phases, instance segmentation of instruments and anatomical structures, instrument-tissue interaction tracking, and quantitative skill scores based on the established competency rubrics like the ICO-OSCAR. The technical quality of the dataset is supported by a series of benchmarking experiments for key surgical AI tasks, including workflow recognition, scene segmentation, and automated skill assessment. Furthermore, we establish a domain adaptation baseline for the phase recognition task by training a model on a subset of surgical centers and evaluating its performance on a held-out center. The dataset and annotations are available in Google Form (https://docs.google.com/forms/d/e/1FAIpQLSfmyMAPSTGrIy2sTnz0-TMw08ZagTimRulbAQcWdaPwDy187A/viewform?usp=dialog).

Problem

Research questions and friction points this paper is trying to address.

Addresses limited diversity in cataract surgery video datasets for AI training

Provides multi-layer annotations for surgical phases and instrument interactions

Establishes benchmarks for surgical workflow recognition and skill assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-source surgical video dataset with 3000 videos

Four annotation layers for surgical analysis tasks

Domain adaptation baseline for phase recognition

🔎 Similar Papers

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models