ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address the three key bottlenecks in surgical workflow recognition—high annotation cost, data scarcity, and model opacity—this paper proposes an interpretable and robust modeling paradigm grounded in dynamic scene graphs and prototype learning. Methodologically, we introduce a novel GNN-based encoder-decoder architecture that integrates self-supervised pretraining with prototype-driven fine-tuning, augmented by a prototype memory module that automatically discovers clinically meaningful surgical interaction prototypes. This enables fine-grained procedural parsing and interpretable localization of deviations or complications. Evaluated on the CAT-SG dataset, our approach significantly outperforms baseline GNN models. Notably, it maintains high accuracy with only one annotated surgical video, demonstrating exceptional few-shot robustness. The generated dynamic scene graphs yield clear, structured, and clinically actionable workflow insights—enhancing both transparency and clinical utility.

Technology Category

Application Category

📝 Abstract

Purpose: Detailed surgical recognition is critical for advancing AI-assisted surgery, yet progress is hampered by high annotation costs, data scarcity, and a lack of interpretable models. While scene graphs offer a structured abstraction of surgical events, their full potential remains untapped. In this work, we introduce ProtoFlow, a novel framework that learns dynamic scene graph prototypes to model complex surgical workflows in an interpretable and robust manner. Methods: ProtoFlow leverages a graph neural network (GNN) encoder-decoder architecture that combines self-supervised pretraining for rich representation learning with a prototype-based fine-tuning stage. This process discovers and refines core prototypes that encapsulate recurring, clinically meaningful patterns of surgical interaction, forming an explainable foundation for workflow analysis. Results: We evaluate our approach on the fine-grained CAT-SG dataset. ProtoFlow not only outperforms standard GNN baselines in overall accuracy but also demonstrates exceptional robustness in limited-data, few-shot scenarios, maintaining strong performance when trained on as few as one surgical video. Our qualitative analyses further show that the learned prototypes successfully identify distinct surgical sub-techniques and provide clear, interpretable insights into workflow deviations and rare complications. Conclusion: By uniting robust representation learning with inherent explainability, ProtoFlow represents a significant step toward developing more transparent, reliable, and data-efficient AI systems, accelerating their potential for clinical adoption in surgical training, real-time decision support, and workflow optimization.

Problem

Research questions and friction points this paper is trying to address.

Developing interpretable surgical workflow models from limited annotated data

Capturing complex surgical interactions through structured scene graph representations

Enhancing robustness in surgical recognition for clinical decision support

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns dynamic scene graph prototypes for interpretable surgical workflow modeling

Uses GNN encoder-decoder with self-supervised pretraining and prototype-based fine-tuning

Demonstrates robustness in few-shot scenarios with minimal surgical video data

🔎 Similar Papers

Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery