ReCAP: Recursive Cross Attention Network for Pseudo-Label Generation in Robotic Surgical Skill Assessment

📅 2024-04-22

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing surgical skill assessment methods regress only aggregate Global Rating Scale (GRS) scores, neglecting clinical variability across the six OSATS dimensions. To address this, we propose the first fine-grained temporal assessment framework leveraging kinematic data. Our method introduces a recursive Transformer with cross-modal attention and a clinically informed multi-task loss function to jointly optimize predictions of both the six OSATS subscores and the overall GRS score. We further devise a novel weakly supervised segment-level pseudo-labeling mechanism, validated by surgical experts achieving 77% inter-rater agreement. Experiments show Spearman correlations of 0.56–0.95 (mean 0.46–0.70) for individual OSATS dimensions and 0.83–0.88 for GRS—significantly outperforming state-of-the-art kinematic-only approaches and matching video-based models, while enabling interpretable qualitative feedback.

Technology Category

Application Category

📝 Abstract

In surgical skill assessment, the Objective Structured Assessments of Technical Skills (OSATS) and Global Rating Scale (GRS) are well-established tools for evaluating surgeons during training. These metrics, along with performance feedback, help surgeons improve and reach practice standards. Recent research on the open-source JIGSAWS dataset, which includes both GRS and OSATS labels, has focused on regressing GRS scores from kinematic data, video, or their combination. However, we argue that regressing GRS alone is limiting, as it aggregates OSATS scores and overlooks clinically meaningful variations during a surgical trial. To address this, we developed a recurrent transformer model that tracks a surgeon's performance throughout a session by mapping hidden states to six OSATS, derived from kinematic data, using a clinically motivated objective function. These OSATS scores are averaged to predict GRS, allowing us to compare our model's performance against state-of-the-art (SOTA) methods. We report Spearman's Correlation Coefficients (SCC) demonstrating that our model outperforms SOTA using kinematic data (SCC 0.83-0.88), and matches performance with video-based models. Our model also surpasses SOTA in most tasks for average OSATS predictions (SCC 0.46-0.70) and specific OSATS (SCC 0.56-0.95). The generation of pseudo-labels at the segment level translates quantitative predictions into qualitative feedback, vital for automated surgical skill assessment pipelines. A senior surgeon validated our model's outputs, agreeing with 77% of the weakly-supervised predictions (p=0.006).

Problem

Research questions and friction points this paper is trying to address.

Develops model to track surgeon performance using OSATS scores

Predicts GRS by averaging OSATS for skill assessment

Generates pseudo-labels for automated surgical feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly-supervised recurrent transformer model

Tracks performance via hidden states mapping

Generates pseudo-labels for qualitative feedback

🔎 Similar Papers

Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery