🤖 AI Summary
Existing surgical scene graph methods overlook the diverse combinations of instruments, actions, targets, and surgeon hand identity, limiting the completeness and granularity of surgical understanding. To address this, we propose a panoramic surgical scene graph modeling framework. First, we introduce Endoscapes-SG201—the first fine-grained surgical scene graph dataset with explicit hand-identity annotations. Second, we design SSG-Com, the first method to jointly model the four semantic elements—instrument, action, target, and operating hand—within a unified graph structure. Leveraging graph neural networks, SSG-Com performs relational reasoning to support downstream tasks including surgical action triplet recognition and safety field-of-view assessment. Experiments demonstrate that our approach significantly outperforms baseline methods on two critical safety-aware tasks, validating the essential role of hand-identity incorporation and four-way collaborative modeling in advancing surgical scene understanding.
📝 Abstract
Surgical scene understanding is crucial for computer-assisted intervention systems, requiring visual comprehension of surgical scenes that involves diverse elements such as surgical tools, anatomical structures, and their interactions. To effectively represent the complex information in surgical scenes, graph-based approaches have been explored to structurally model surgical entities and their relationships. Previous surgical scene graph studies have demonstrated the feasibility of representing surgical scenes using graphs. However, certain aspects of surgical scenes-such as diverse combinations of tool-action-target and the identity of the hand operating the tool-remain underexplored in graph-based representations, despite their importance. To incorporate these aspects into graph representations, we propose Endoscapes-SG201 dataset, which includes annotations for tool-action-target combinations and hand identity. We also introduce SSG-Com, a graph-based method designed to learn and represent these critical elements. Through experiments on downstream tasks such as critical view of safety assessment and action triplet recognition, we demonstrated the importance of integrating these essential scene graph components, highlighting their significant contribution to surgical scene understanding. The code and dataset are available at https://github.com/ailab-kyunghee/SSG-Com