🤖 AI Summary
Biomedical foundation models (e.g., BERT, GPT-2, ViT) suffer from high inference latency and inefficiency–accuracy trade-offs due to “overthinking” in clinical real-time settings. To address this, we propose Entropy- and Patience-driven Early Exiting (EPEE), a novel hybrid early-exit strategy. EPEE introduces the first dual-criterion mechanism jointly leveraging entropy-based confidence estimation and adaptive patience counting, integrated with inter-layer confidence calibration and multi-task joint exit decision-making. This enables cross-modal, task-agnostic dynamic early exiting. Compatible with both text- and image-based models in a plug-and-play manner, EPEE achieves an average 2.1× speedup across 12 clinical benchmarks, with accuracy maintained or improved by 0.3–1.7%. The method significantly enhances real-time performance and reliability for clinical classification, relation extraction, and event extraction tasks.
📝 Abstract
Foundation models, including language models, e.g., GPT, and vision models, e.g., CLIP, have significantly advanced numerous biomedical tasks. Despite these advancements, the high inference latency and the"overthinking"issues in model inference impair the efficiency and effectiveness of foundation models, thus limiting their application in real-time clinical settings. To address these challenges, we proposed EPEE (Entropy- and Patience-based Early Exiting), a novel hybrid strategy designed to improve the inference efficiency of foundation models. The core idea was to leverage the strengths of entropy-based and patience-based early exiting methods to overcome their respective weaknesses. To evaluate EPEE, we conducted experiments on three core biomedical tasks-classification, relation extraction, and event extraction-using four foundation models (BERT, ALBERT, GPT-2, and ViT) across twelve datasets, including clinical notes and medical images. The results showed that EPEE significantly reduced inference time while maintaining or improving accuracy, demonstrating its adaptability to diverse datasets and tasks. EPEE addressed critical barriers to deploying foundation models in healthcare by balancing efficiency and effectiveness. It potentially provided a practical solution for real-time clinical decision-making with foundation models, supporting reliable and efficient workflows.