A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches struggle to automatically evaluate the quality of attending surgeons’ verbal feedback in surgical training and its actual impact on residents’ behavior. This work proposes a two-stage, multi-agent large language model framework that integrates domain-specific surgical knowledge through collaborative agent prompting and knowledge injection. For the first time, the framework enables AI to autonomously identify interpretable, human-aligned dimensions of feedback quality—such as clarity and urgency—thereby overcoming the limitations of conventional content-based analysis. Experimental results on 4,200 real-world feedback instances demonstrate that the scoring criteria generated by this method significantly outperform existing content-oriented approaches in predicting feedback effectiveness, including resident behavioral adjustments and attending surgeon endorsement.
📝 Abstract
Verbal feedback delivered by attending surgeons in the operating room plays a critical formative role in resident trainee skill acquisition. Yet, assessing the quality of trainer feedback and its effectiveness in influencing trainee behavior during live surgery remains a challenge. Prior studies assessed feedback content relying on extensive manual annotation by expert human raters and focused on developing broad taxonomies that overlook the qualitative aspects of feedback delivery such as clarity or urgency. Limited existing automated methods, including keyword analysis and topic modeling, also fail to capture these nuanced aspects. We introduce a two-stage LLM-based framework that discovers interpretable feedback quality criteria grounded in the context of surgical training. Our method uses multi-agent prompting and surgical domain knowledge injection to discover a small set of human interpretable scoring criteria (e.g., Encouraging, Urgent, Clear). These criteria are then used to automatically score live surgical feedback via an LLM-as-a-judge approach. Evaluation on 4.2k trainer feedback instances demonstrates that our AI-discovered criteria outperform prior content-based frameworks in predicting feedback effectiveness, including observed trainee behavioral adjustments and trainer approval. This work advances scalable, human-aligned assessment of communication quality in the operating room and provides a foundation for improving surgical teaching practices.
Problem

Research questions and friction points this paper is trying to address.

surgical feedback
feedback quality
trainee behavior
operating room communication
formative assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent LLM
Feedback Quality Assessment
Surgical Education
LLM-as-a-Judge
Interpretable Criteria
🔎 Similar Papers
R
Rafal Kocielnik
Computing + Mathematical Sciences, California Institute of Technology, 1200 E. California Blvd, Pasadena, 91125, CA, USA.
J
J. Everett Knudsen
Keck School of Medicine, University of Southern California, 1500 San Pablo Street, Los Angeles, 90033, CA, USA.
S
Steven Y. Cen
Keck School of Medicine, University of Southern California, 1500 San Pablo Street, Los Angeles, 90033, CA, USA.
J
Jasmine Lin
Department of Urology, Cedars-Sinai, 8700 Beverly Blvd, Los Angeles, 90048, CA, USA.
C
Cherine H. Yang
Department of Urology, Cedars-Sinai, 8700 Beverly Blvd, Los Angeles, 90048, CA, USA.
A
Atharva Deo
Department of Urology, Cedars-Sinai, 8700 Beverly Blvd, Los Angeles, 90048, CA, USA.
Ujjwal Pasupulety
Ujjwal Pasupulety
University of Southern California
Artificial IntelligenceNatural Language ProcessingPsychologyMental HealthPsychotherapy
P
Peter Wager
Department of Urology, Cedars-Sinai, 8700 Beverly Blvd, Los Angeles, 90048, CA, USA.
Anima Anandkumar
Anima Anandkumar
California Institute of Technology and NVIDIA
Machine Learning and Artificial Intelligence
Andrew J. Hung
Andrew J. Hung
Vice Chair for Academic Development, Department of Urology, Cedars-Sinai Medical Center
Surgical Assessment & TrainingRobotic SurgeryMachine Learning