EPEE: Towards Efficient and Effective Foundation Models in Biomedicine

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Biomedical foundation models (e.g., BERT, GPT-2, ViT) suffer from high inference latency and inefficiency–accuracy trade-offs due to “overthinking” in clinical real-time settings. To address this, we propose Entropy- and Patience-driven Early Exiting (EPEE), a novel hybrid early-exit strategy. EPEE introduces the first dual-criterion mechanism jointly leveraging entropy-based confidence estimation and adaptive patience counting, integrated with inter-layer confidence calibration and multi-task joint exit decision-making. This enables cross-modal, task-agnostic dynamic early exiting. Compatible with both text- and image-based models in a plug-and-play manner, EPEE achieves an average 2.1× speedup across 12 clinical benchmarks, with accuracy maintained or improved by 0.3–1.7%. The method significantly enhances real-time performance and reliability for clinical classification, relation extraction, and event extraction tasks.

Technology Category

Application Category

📝 Abstract
Foundation models, including language models, e.g., GPT, and vision models, e.g., CLIP, have significantly advanced numerous biomedical tasks. Despite these advancements, the high inference latency and the"overthinking"issues in model inference impair the efficiency and effectiveness of foundation models, thus limiting their application in real-time clinical settings. To address these challenges, we proposed EPEE (Entropy- and Patience-based Early Exiting), a novel hybrid strategy designed to improve the inference efficiency of foundation models. The core idea was to leverage the strengths of entropy-based and patience-based early exiting methods to overcome their respective weaknesses. To evaluate EPEE, we conducted experiments on three core biomedical tasks-classification, relation extraction, and event extraction-using four foundation models (BERT, ALBERT, GPT-2, and ViT) across twelve datasets, including clinical notes and medical images. The results showed that EPEE significantly reduced inference time while maintaining or improving accuracy, demonstrating its adaptability to diverse datasets and tasks. EPEE addressed critical barriers to deploying foundation models in healthcare by balancing efficiency and effectiveness. It potentially provided a practical solution for real-time clinical decision-making with foundation models, supporting reliable and efficient workflows.
Problem

Research questions and friction points this paper is trying to address.

Reduces high inference latency in biomedical foundation models.
Addresses 'overthinking' issues during model inference.
Enhances efficiency and effectiveness for real-time clinical applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

EPEE combines entropy- and patience-based early exiting.
Reduces inference time while maintaining accuracy.
Adaptable to diverse biomedical datasets and tasks.
🔎 Similar Papers
No similar papers found.
Zaifu Zhan
Zaifu Zhan
PhD at University of Minnesota, MS at Tsinghua University
Natural language processingMachine LearningAI for BiomedicineLarge Language model
S
Shuang Zhou
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, United States
Huixue Zhou
Huixue Zhou
PhD candidate at University of Minnesota
Natural Language ProcessingHealth Informatics
Zirui Liu
Zirui Liu
Peking University
SystemsAlgorithmsData Structures
R
Rui Zhang
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, United States