Graph Neural Networks for Surgical Scene Segmentation

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In laparoscopic cholecystectomy, accurate segmentation of hepatobiliary anatomy is challenged by severe occlusion, difficulty in modeling long-range spatial dependencies, and inconsistent identification of rare, fine-grained structures—particularly critical components of the Calot triangle. To address these issues, we propose a hybrid ViT-GNN architecture that innovatively integrates static k-nearest-neighbor graphs with dynamic differentiable graph generation, enabling adaptive modeling of both semantic and geometric relationships among anatomical regions. Multi-scale features are extracted via Vision Transformers (ViT), while GCNII and Graph Attention Networks (GAT) jointly ensure stable message propagation and robust topological learning. Evaluated on Endoscapes-Seg50 and CholecSeg8k, our method achieves +7–8% mIoU and +6% mDice improvements over prior state-of-the-art methods, with marked gains in robustness and consistency for safety-critical fine structures.

Technology Category

Application Category

📝 Abstract
Purpose: Accurate identification of hepatocystic anatomy is critical to preventing surgical complications during laparoscopic cholecystectomy. Deep learning models often struggle with occlusions, long-range dependencies, and capturing the fine-scale geometry of rare structures. This work addresses these challenges by introducing graph-based segmentation approaches that enhance spatial and semantic understanding in surgical scene analyses. Methods: We propose two segmentation models integrating Vision Transformer (ViT) feature encoders with Graph Neural Networks (GNNs) to explicitly model spatial relationships between anatomical regions. (1) A static k Nearest Neighbours (k-NN) graph with a Graph Convolutional Network with Initial Residual and Identity Mapping (GCNII) enables stable long-range information propagation. (2) A dynamic Differentiable Graph Generator (DGG) with a Graph Attention Network (GAT) supports adaptive topology learning. Both models are evaluated on the Endoscapes-Seg50 and CholecSeg8k benchmarks. Results: The proposed approaches achieve up to 7-8% improvement in Mean Intersection over Union (mIoU) and 6% improvement in Mean Dice (mDice) scores over state-of-the-art baselines. It produces anatomically coherent predictions, particularly on thin, rare and safety-critical structures. Conclusion: The proposed graph-based segmentation methods enhance both performance and anatomical consistency in surgical scene segmentation. By combining ViT-based global context with graph-based relational reasoning, the models improve interpretability and reliability, paving the way for safer laparoscopic and robot-assisted surgery through a precise identification of critical anatomical features.
Problem

Research questions and friction points this paper is trying to address.

Improving surgical scene segmentation accuracy for hepatocystic anatomy identification
Addressing challenges with occlusions and fine-scale geometry in rare structures
Enhancing spatial and semantic understanding in laparoscopic cholecystectomy analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformer feature encoders with Graph Neural Networks
Static k-NN graph with GCNII for long-range propagation
Dynamic Differentiable Graph Generator with GAT for adaptive learning
🔎 Similar Papers
No similar papers found.