SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Current autonomous surgical systems struggle with prolonged, highly dynamic minimally invasive procedures—such as laparoscopic cholecystectomy—due to limitations in handling tissue deformation, sustaining task execution over time, and ensuring clinical interpretability. This paper proposes a language-driven hierarchical autonomous surgical framework that achieves, for the first time, fully autonomous step-level execution. It employs natural-language instructions to guide high-level task planning and integrates hierarchical reinforcement learning with language-conditioned imitation learning to realize end-to-end closed-loop control—from semantic directives to task-space motion execution. The hierarchical architecture markedly enhances state recovery and environmental adaptability. Evaluated on an ex-vivo platform, the system successfully completed eight cholecystectomies with 100% success rate and zero human intervention, establishing the first verifiable, step-level autonomous surgical paradigm.

Technology Category

Application Category

📝 Abstract

Research on autonomous robotic surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications require dexterous manipulation over extended time scales while demanding generalization across diverse variations in human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning strategies. To bridge this gap, we propose a hierarchical framework for dexterous, long-horizon surgical tasks. Our method employs a high-level policy for task planning and a low-level policy for generating task-space controls for the surgical robot. The high-level planner plans tasks using language, producing task-specific or corrective instructions that guide the robot at a coarse level. Leveraging language as a planning modality offers an intuitive and generalizable interface, mirroring how experienced surgeons instruct traineers during procedures. We validate our framework in ex-vivo experiments on a complex minimally invasive procedure, cholecystectomy, and conduct ablative studies to assess key design choices. Our approach achieves a 100% success rate across n=8 different ex-vivo gallbladders, operating fully autonomously without human intervention. The hierarchical approach greatly improves the policy's ability to recover from suboptimal states that are inevitable in the highly dynamic environment of realistic surgical applications. This work represents the first demonstration of step-level autonomy, marking a critical milestone toward autonomous surgical systems for clinical studies. By advancing generalizable autonomy in surgical robotics, our approach brings the field closer to real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

Addressing dexterous manipulation in long-duration surgeries

Overcoming generalization challenges in diverse human tissues

Enhancing recovery from suboptimal states in dynamic environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical framework for long-horizon surgical tasks

Language conditioned high-level policy for planning

Task-space controls via low-level policy execution

🔎 Similar Papers

From Decision to Action in Surgical Autonomy: Multi-Modal Large Language Models for Robot-Assisted Blood Suction