Analysis of Transferability Estimation Metrics for Surgical Phase Recognition

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study addresses the bottleneck in surgical video analysis—downstream tasks (e.g., surgical phase recognition) requiring extensive expert annotations for fine-tuning. We propose a no-retraining, transferability evaluation method based on pretrained model embedding features. For the first time, we systematically benchmark three source-agnostic metrics—LogME, H-Score, and TransRate—on the RAMIE and AutoLaparo datasets. Results show LogME (especially with subset-min aggregation) achieves the highest correlation with actual fine-tuning performance; H-Score exhibits limited predictive power, while TransRate suffers from rank reversal. Our key contribution is identifying the discriminative failure of existing metrics when candidate models yield comparable performance, and proposing a principled model selection strategy that jointly considers feature diversity and validation fidelity. This approach significantly reduces annotation dependency and improves the efficiency of selecting optimal transferable models for surgical video analysis.

Technology Category

Application Category

📝 Abstract

Fine-tuning pre-trained models has become a cornerstone of modern machine learning, allowing practitioners to achieve high performance with limited labeled data. In surgical video analysis, where expert annotations are especially time-consuming and costly, identifying the most suitable pre-trained model for a downstream task is both critical and challenging. Source-independent transferability estimation (SITE) offers a solution by predicting how well a model will fine-tune on target data using only its embeddings or outputs, without requiring full retraining. In this work, we formalize SITE for surgical phase recognition and provide the first comprehensive benchmark of three representative metrics, LogME, H-Score, and TransRate, on two diverse datasets (RAMIE and AutoLaparo). Our results show that LogME, particularly when aggregated by the minimum per-subset score, aligns most closely with fine-tuning accuracy; H-Score yields only weak predictive power; and TransRate often inverses true model rankings. Ablation studies show that when candidate models have similar performances, transferability estimates lose discriminative power, emphasizing the importance of maintaining model diversity or using additional validation. We conclude with practical guidelines for model selection and outline future directions toward domain-specific metrics, theoretical foundations, and interactive benchmarking tools.

Problem

Research questions and friction points this paper is trying to address.

Evaluating transferability metrics for surgical phase recognition

Benchmarking LogME, H-Score, and TransRate on surgical datasets

Identifying optimal pre-trained models without full retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

LogME metric for transferability estimation

H-Score shows weak predictive power

TransRate often inverses model rankings

🔎 Similar Papers

No similar papers found.