Generalized Recognition of Basic Surgical Actions Enables Skill Assessment and Vision-Language-Model-based Surgical Planning

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current intelligent surgical systems lack the capability to universally recognize basic surgical actions (BSAs) across specialties, limiting their application in skill assessment and automated surgical planning. This work addresses this gap by constructing the first large-scale, multi-scenario video dataset comprising over 10,000 clips spanning six surgical specialties and ten BSA categories. We propose the first foundation model for generic BSA recognition, integrating a vision–language architecture with domain-specific surgical knowledge. The model demonstrates strong generalization across surgical specialties and anatomical sites. Its derived skill assessments prove effective in prostatectomy, while the generated surgical planning narratives receive validation from clinical experts across multiple countries in cholecystectomy and nephrectomy, advancing the development of interpretable and generalizable surgical superintelligence.

Technology Category

Application Category

📝 Abstract
Artificial intelligence, imaging, and large language models have the potential to transform surgical practice, training, and automation. Understanding and modeling of basic surgical actions (BSA), the fundamental unit of operation in any surgery, is important to drive the evolution of this field. In this paper, we present a BSA dataset comprising 10 basic actions across 6 surgical specialties with over 11,000 video clips, which is the largest to date. Based on the BSA dataset, we developed a new foundation model that conducts general-purpose recognition of basic actions. Our approach demonstrates robust cross-specialist performance in experiments validated on datasets from different procedural types and various body parts. Furthermore, we demonstrate downstream applications enabled by the BAS foundation model through surgical skill assessment in prostatectomy using domain-specific knowledge, and action planning in cholecystectomy and nephrectomy using large vision-language models. Multinational surgeons' evaluation of the language model's output of the action planning explainable texts demonstrated clinical relevance. These findings indicate that basic surgical actions can be robustly recognized across scenarios, and an accurate BSA understanding model can essentially facilitate complex applications and speed up the realization of surgical superintelligence.
Problem

Research questions and friction points this paper is trying to address.

Basic Surgical Actions
Surgical Skill Assessment
Vision-Language Models
Surgical Planning
Action Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Basic Surgical Actions
Foundation Model
Surgical Skill Assessment
Vision-Language Model
Action Planning
🔎 Similar Papers
No similar papers found.
Mengya Xu
Mengya Xu
The Chinese University of Hong Kong
Vision-Language based Surgical Scene Understanding
Daiyun Shen
Daiyun Shen
PhD at National University of Singapore
medical AI
Jie Zhang
Jie Zhang
The Chinese University of Hong Kong
Hardware Security & Reliability
H
Hon Chi Yip
Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China
Y
Yujia Gao
Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, National University Hospital, Singapore
Cheng Chen
Cheng Chen
Assistant Professor, The University of Hong Kong
Medical Image AnalysisAI for HealthcareDeep Learning
D
Dillan Imans
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
Y
Yonghao Long
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
Y
Yiru Ye
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
Y
Yixiao Liu
Department of Urology, Peking University Third Hospital, Beijing, China
R
Rongyun Mai
Department of Hepatobiliary and Pancreatic Surgery, Guangxi Medical University Cancer Hospital, Guangxi Medical University, Nanning, China
K
Kai Chen
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
Hongliang Ren
Hongliang Ren
Chinese University of Hong Kong | National University of Singapore | JHU/Harvard(RF) | CUHK(PhD)
Biorobotics & intelligent systemsmedical mechatronicscontinuumsoft flexible robots/sensorsmultisensory perception
Y
Yutong Ban
Global College, Shanghai Jiao Tong University, Shanghai, China
G
Guangsuo Wang
Department of Thoracic Surgery, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University; The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong, China
F
Francis Wong
Division of Urology, Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China
C
Chi-Fai Ng
Division of Urology, Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China
Kee Yuan Ngiam
Kee Yuan Ngiam
National University Hospital Singapore
Endocrine surgeryArtificial IntelligenceBig Data analyticsBariatric and Metabolic Surgery
Russell H. Taylor
Russell H. Taylor
John C. Malone Professor of Computer Science, Johns Hopkins University
RoboticsMedical RoboticsComputer-Integrated SurgeryComputer-Assisted Surgery
Daguang Xu
Daguang Xu
Senior Research Manager at NVIDIA
Deep LearningMachine LearningMedical Image AnalysisCompressive SensingSparse coding
Yueming Jin
Yueming Jin
Assistant Professor, National University of Singapore
Medical Image AnalysisSurgical AI&RoboticsMultimodal Learning
Q
Qi Dou
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China