SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing surgical scene understanding methods struggle to support real-time, text-prompted 3D semantic queries—particularly for precise identification and interaction between surgical instruments and anatomical structures. To address this, we propose the first integration of vision-language models (VLMs) with differentiable Gaussian splatting, introducing semantic-aware deformation tracking and region-aware optimization to achieve dynamic 3D reconstruction that jointly preserves geometric fidelity and semantic consistency. Our method synergistically combines Segment Anything for mask generation, VLM-driven text-scene alignment, differentiable rendering, and semantic-region-supervised optimization. Evaluated on a real-world surgical dataset, our approach significantly outperforms state-of-the-art methods, enabling high-fidelity textured reconstruction and fine-grained, text-driven queries (e.g., “locate the electrocautery hook currently in use”). This work establishes a new paradigm for intelligent surgical planning and intraoperative navigation.

Technology Category

Application Category

📝 Abstract

In contemporary surgical research and practice, accurately comprehending 3D surgical scenes with text-promptable capabilities is particularly crucial for surgical planning and real-time intra-operative guidance, where precisely identifying and interacting with surgical tools and anatomical structures is paramount. However, existing works focus on surgical vision-language model (VLM), 3D reconstruction, and segmentation separately, lacking support for real-time text-promptable 3D queries. In this paper, we present SurgTPGS, a novel text-promptable Gaussian Splatting method to fill this gap. We introduce a 3D semantics feature learning strategy incorporating the Segment Anything model and state-of-the-art vision-language models. We extract the segmented language features for 3D surgical scene reconstruction, enabling a more in-depth understanding of the complex surgical environment. We also propose semantic-aware deformation tracking to capture the seamless deformation of semantic features, providing a more precise reconstruction for both texture and semantic features. Furthermore, we present semantic region-aware optimization, which utilizes regional-based semantic information to supervise the training, particularly promoting the reconstruction quality and semantic smoothness. We conduct comprehensive experiments on two real-world surgical datasets to demonstrate the superiority of SurgTPGS over state-of-the-art methods, highlighting its potential to revolutionize surgical practices. SurgTPGS paves the way for developing next-generation intelligent surgical systems by enhancing surgical precision and safety. Our code is available at: https://github.com/lastbasket/SurgTPGS.

Problem

Research questions and friction points this paper is trying to address.

Lack of real-time text-promptable 3D queries in surgical scenes

Separate focus on surgical VLM, 3D reconstruction, and segmentation

Need for precise 3D semantic understanding of surgical tools and anatomy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-promptable Gaussian Splatting for 3D queries

Semantic feature learning with Segment Anything model

Semantic-aware deformation tracking for precise reconstruction

🔎 Similar Papers

No similar papers found.