The SAGES Critical View of Safety Challenge: A Global Benchmark for AI-Assisted Surgical Quality Assessment

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

In laparoscopic cholecystectomy, inconsistent adherence to safety protocols, high subjectivity in manual assessment, and poor robustness against clinical variability hinder reliable surgical quality evaluation. Method: We constructed a large-scale, multicenter video dataset comprising 1,000 procedures from 54 centers across 24 countries—the first AI competition initiated by a surgical society—and introduced a rigorous multiviewer, multicenter consensus validation framework. Leveraging the EndoGlacier platform, we enabled heterogeneous video management and collaborative annotation, integrating temporal action recognition with confidence calibration. Contribution/Results: Our approach significantly enhances model reliability and generalizability: it achieves up to a 17% performance gain over state-of-the-art methods, reduces calibration error by over 80%, and improves robustness by 17%. This work establishes the first clinically validated, deployable AI benchmark for objective, scalable surgical quality assessment.

Technology Category

Application Category

📝 Abstract

Advances in artificial intelligence (AI) for surgical quality assessment promise to democratize access to expertise, with applications in training, guidance, and accreditation. This study presents the SAGES Critical View of Safety (CVS) Challenge, the first AI competition organized by a surgical society, using the CVS in laparoscopic cholecystectomy, a universally recommended yet inconsistently performed safety step, as an exemplar of surgical quality assessment. A global collaboration across 54 institutions in 24 countries engaged hundreds of clinicians and engineers to curate 1,000 videos annotated by 20 surgical experts according to a consensus-validated protocol. The challenge addressed key barriers to real-world deployment in surgery, including achieving high performance, capturing uncertainty in subjective assessment, and ensuring robustness to clinical variability. To enable this scale of effort, we developed EndoGlacier, a framework for managing large, heterogeneous surgical video and multi-annotator workflows. Thirteen international teams participated, achieving up to a 17% relative gain in assessment performance, over 80% reduction in calibration error, and a 17% relative improvement in robustness over the state-of-the-art. Analysis of results highlighted methodological trends linked to model performance, providing guidance for future research toward robust, clinically deployable AI for surgical quality assessment.

Problem

Research questions and friction points this paper is trying to address.

Developing AI benchmarks for surgical quality assessment in laparoscopic cholecystectomy

Addressing barriers to real-world AI deployment in surgical settings

Creating robust AI systems for subjective surgical safety evaluations

Innovation

Methods, ideas, or system contributions that make the work stand out.

EndoGlacier framework for managing surgical video workflows

AI competition addressing uncertainty in subjective assessment

Multi-annotator consensus protocol for surgical quality validation

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?