🤖 AI Summary
This paper addresses the challenging intraoperative phase recognition problem in arthroscopic surgery—particularly anterior cruciate ligament (ACL) reconstruction—where severely limited field-of-view and frequent instrument/tissue occlusions hinder robust temporal understanding. To this end, we introduce ACL27, the first dedicated dataset for arthroscopic ACL reconstruction with fine-grained phase annotations, and propose a novel Transformer-based spatiotemporal modeling framework. Our method integrates ResNet-50 for visual feature extraction with a temporal self-attention mechanism and introduces the Surgical Progress Index (SPI) to quantitatively measure procedural advancement. Key contributions include: (1) releasing ACL27, the first publicly available, phase-annotated ACL reconstruction dataset; (2) designing a cross-surgical-task transferable phase recognition framework validated on both ACL and laparoscopic cholecystectomy; and (3) achieving 72.91% accuracy and 10.6 SPI error on ACL27, and 92.4% accuracy and 9.86 SPI error on Cholec80. Code and dataset are publicly released.
📝 Abstract
This study aims to advance surgical phase recognition in arthroscopic procedures, specifically Anterior Cruciate Ligament (ACL) reconstruction, by introducing the first arthroscopy dataset and developing a novel transformer-based model. We aim to establish a benchmark for arthroscopic surgical phase recognition by leveraging spatio-temporal features to address the specific challenges of arthroscopic videos including limited field of view, occlusions, and visual distortions. We developed the ACL27 dataset, comprising 27 videos of ACL surgeries, each labeled with surgical phases. Our model employs a transformer-based architecture, utilizing temporal-aware frame-wise feature extraction through a ResNet-50 and transformer layers. This approach integrates spatio-temporal features and introduces a Surgical Progress Index (SPI) to quantify surgery progression. The model's performance was evaluated using accuracy, precision, recall, and Jaccard Index on the ACL27 and Cholec80 datasets. The proposed model achieved an overall accuracy of 72.91% on the ACL27 dataset. On the Cholec80 dataset, the model achieved a comparable performance with the state-of-the-art methods with an accuracy of 92.4%. The SPI demonstrated an output error of 10.6% and 9.86% on ACL27 and Cholec80 datasets respectively, indicating reliable surgery progression estimation. This study introduces a significant advancement in surgical phase recognition for arthroscopy, providing a comprehensive dataset and a robust transformer-based model. The results validate the model's effectiveness and generalizability, highlighting its potential to improve surgical training, real-time assistance, and operational efficiency in orthopedic surgery. The publicly available dataset and code will facilitate future research and development in this critical field.