Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

To address the poor timeliness, limited accessibility, and low consistency of manual feedback in surgical training, this paper proposes the first clinically credible framework for automatic natural-language surgical feedback generation. Methodologically: (1) we introduce an Instrument-Action-Target (IAT) ontology to enable structured, interpretable representation of surgical actions; (2) we design a context-augmented video-to-IAT recognition model coupled with fine-grained temporal modeling of instrument motion; and (3) we condition GPT-4o to generate trainer-style feedback guided by IAT triples. Our key contribution lies in the first integration of explicit semantic structure into large-model conditional generation, ensuring clinical fidelity. Experiments demonstrate significant improvements: IAT recognition AUC increases notably; feedback faithfulness scores rise from 2.17 to 2.44; acceptability rate (≥3/5) improves from 21% to 42%; word error rate decreases by 15–31%; and ROUGE-L scores increase by 9–64%.

Technology Category

Application Category

📝 Abstract

High-quality intraoperative feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition. Automating natural, trainer-style feedback promises timely, accessible, and consistent guidance at scale but requires models that understand clinically relevant representations. We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts (33 surgeries) and uses it to condition feedback generation. We contribute by (1) mining Instrument-Action-Target (IAT) triplets from real-world feedback text and clustering surface forms into normalized categories, (2) fine-tuning a video-to-IAT model that leverages the surgical procedure and task contexts as well as fine-grained temporal instrument motion, and (3) demonstrating how to effectively use IAT triplet representations to guide GPT-4o in generating clinically grounded, trainer-style feedback. We show that, on Task 1: Video-to-IAT recognition, our context injection and temporal tracking deliver consistent AUC gains (Instrument: 0.67 to 0.74; Action: 0.60 to 0.63; Tissue: 0.74 to 0.79). For Task 2: feedback text generation (rated on a 1-5 fidelity rubric where 1 = opposite/unsafe, 3 = admissible, and 5 = perfect match to a human trainer), GPT-4o from video alone scores 2.17, while IAT conditioning reaches 2.44 (+12.4%), doubling the share of admissible generations with score>= 3 from 21% to 42%. Traditional text-similarity metrics also improve: word error rate decreases by 15-31% and ROUGE (phrase/substring overlap) increases by 9-64%. Grounding generation in explicit IAT structure improves fidelity and yields clinician-verifiable rationales, supporting auditable use in surgical training.

Problem

Research questions and friction points this paper is trying to address.

Automating natural surgical feedback generation for trainee skill improvement

Understanding clinically relevant representations from surgical video data

Generating trainer-style feedback grounded in surgical action ontology

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mining Instrument-Action-Target triplets from surgical feedback transcripts

Fine-tuning video-to-IAT model with temporal instrument motion tracking

Guiding GPT-4o with IAT representations for clinical feedback generation

🔎 Similar Papers

Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation