Towards Holistic Surgical Scene Graph

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing surgical scene graph methods overlook the diverse combinations of instruments, actions, targets, and surgeon hand identity, limiting the completeness and granularity of surgical understanding. To address this, we propose a panoramic surgical scene graph modeling framework. First, we introduce Endoscapes-SG201—the first fine-grained surgical scene graph dataset with explicit hand-identity annotations. Second, we design SSG-Com, the first method to jointly model the four semantic elements—instrument, action, target, and operating hand—within a unified graph structure. Leveraging graph neural networks, SSG-Com performs relational reasoning to support downstream tasks including surgical action triplet recognition and safety field-of-view assessment. Experiments demonstrate that our approach significantly outperforms baseline methods on two critical safety-aware tasks, validating the essential role of hand-identity incorporation and four-way collaborative modeling in advancing surgical scene understanding.

Technology Category

Application Category

📝 Abstract
Surgical scene understanding is crucial for computer-assisted intervention systems, requiring visual comprehension of surgical scenes that involves diverse elements such as surgical tools, anatomical structures, and their interactions. To effectively represent the complex information in surgical scenes, graph-based approaches have been explored to structurally model surgical entities and their relationships. Previous surgical scene graph studies have demonstrated the feasibility of representing surgical scenes using graphs. However, certain aspects of surgical scenes-such as diverse combinations of tool-action-target and the identity of the hand operating the tool-remain underexplored in graph-based representations, despite their importance. To incorporate these aspects into graph representations, we propose Endoscapes-SG201 dataset, which includes annotations for tool-action-target combinations and hand identity. We also introduce SSG-Com, a graph-based method designed to learn and represent these critical elements. Through experiments on downstream tasks such as critical view of safety assessment and action triplet recognition, we demonstrated the importance of integrating these essential scene graph components, highlighting their significant contribution to surgical scene understanding. The code and dataset are available at https://github.com/ailab-kyunghee/SSG-Com
Problem

Research questions and friction points this paper is trying to address.

Model diverse tool-action-target combinations in surgical scenes
Incorporate hand identity into surgical scene graph representations
Enhance surgical scene understanding for computer-assisted intervention systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed Endoscapes-SG201 dataset for annotations
Introduced SSG-Com graph-based method
Integrated tool-action-target and hand identity
🔎 Similar Papers
No similar papers found.
Jongmin Shin
Jongmin Shin
Samsung Electronics
Computer Science
E
Enki Cho
Kyung Hee University, Yongin 17104, Republic of Korea
K
Ka Yong Kim
Kyung Hee University, Yongin 17104, Republic of Korea
J
Jung Yong Kim
Department of Surgery, Samsung Medical Center, Seoul 06351, Republic of Korea
Seong Tae Kim
Seong Tae Kim
Assistant Professor of Computer Science, Kyung Hee University
Explainable AITrustworthy AIVision-language ModelsSurgical AIMLLM
Namkee Oh
Namkee Oh
Department of Surgery, Samsung Medical Center
minimally invasive liver surgerytransplantationAI in surgery