🤖 AI Summary
In laparoscopic cholecystectomy, accurate segmentation of hepatobiliary anatomy is challenged by severe occlusion, difficulty in modeling long-range spatial dependencies, and inconsistent identification of rare, fine-grained structures—particularly critical components of the Calot triangle. To address these issues, we propose a hybrid ViT-GNN architecture that innovatively integrates static k-nearest-neighbor graphs with dynamic differentiable graph generation, enabling adaptive modeling of both semantic and geometric relationships among anatomical regions. Multi-scale features are extracted via Vision Transformers (ViT), while GCNII and Graph Attention Networks (GAT) jointly ensure stable message propagation and robust topological learning. Evaluated on Endoscapes-Seg50 and CholecSeg8k, our method achieves +7–8% mIoU and +6% mDice improvements over prior state-of-the-art methods, with marked gains in robustness and consistency for safety-critical fine structures.
📝 Abstract
Purpose: Accurate identification of hepatocystic anatomy is critical to preventing surgical complications during laparoscopic cholecystectomy. Deep learning models often struggle with occlusions, long-range dependencies, and capturing the fine-scale geometry of rare structures. This work addresses these challenges by introducing graph-based segmentation approaches that enhance spatial and semantic understanding in surgical scene analyses.
Methods: We propose two segmentation models integrating Vision Transformer (ViT) feature encoders with Graph Neural Networks (GNNs) to explicitly model spatial relationships between anatomical regions. (1) A static k Nearest Neighbours (k-NN) graph with a Graph Convolutional Network with Initial Residual and Identity Mapping (GCNII) enables stable long-range information propagation. (2) A dynamic Differentiable Graph Generator (DGG) with a Graph Attention Network (GAT) supports adaptive topology learning. Both models are evaluated on the Endoscapes-Seg50 and CholecSeg8k benchmarks.
Results: The proposed approaches achieve up to 7-8% improvement in Mean Intersection over Union (mIoU) and 6% improvement in Mean Dice (mDice) scores over state-of-the-art baselines. It produces anatomically coherent predictions, particularly on thin, rare and safety-critical structures.
Conclusion: The proposed graph-based segmentation methods enhance both performance and anatomical consistency in surgical scene segmentation. By combining ViT-based global context with graph-based relational reasoning, the models improve interpretability and reliability, paving the way for safer laparoscopic and robot-assisted surgery through a precise identification of critical anatomical features.