Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones

📅 2024-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large and small models struggle to jointly achieve high accuracy and efficiency under identical computational budgets. Method: We propose Performance-Controlled Early Exit (PCEE), a novel early-exit mechanism that replaces conventional single-threshold confidence-based decisions with a joint calibration framework leveraging validation-set ensemble accuracy and confidence scores—enabling fine-grained, controllable accuracy tuning. PCEE integrates dynamic computation allocation with an early-exit architecture, optimizing inference paths without increasing FLOPs. Results: On BERT-large, PCEE achieves a 2.1% accuracy gain over BERT-base under the same computational budget, reduces inference latency by 37%, and maintains precision control within ±0.3%. This work is the first to systematically demonstrate that “large model + early exit” can outperform small models end-to-end while mitigating overconfidence-induced performance degradation in large models.

Technology Category

Application Category

📝 Abstract
Early Exiting (EE) is a promising technique for speeding up inference by adaptively allocating compute resources to data points based on their difficulty. The approach enables predictions to exit at earlier layers for simpler samples while reserving more computation for challenging ones. In this study, we first present a novel perspective on the EE approach, showing that larger models deployed with EE can achieve higher performance than smaller models while maintaining similar computational costs. As existing EE approaches rely on confidence estimation at each exit point, we further study the impact of overconfidence on the controllability of the compute-performance trade-off. We introduce Performance Control Early Exiting (PCEE), a method that enables accuracy thresholding by basing decisions not on a data point's confidence but on the average accuracy of samples with similar confidence levels from a held-out validation set. In our experiments, we show that PCEE offers a simple yet computationally efficient approach that provides better control over performance than standard confidence-based approaches, and allows us to scale up model sizes to yield performance gain while reducing the computational cost.
Problem

Research questions and friction points this paper is trying to address.

Early Exit Strategy
Model Deployment
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

PCEE
Performance Control
Efficiency Improvement
🔎 Similar Papers
No similar papers found.
M
Mehrnaz Mofakhami
ServiceNow Research, Mila & Université de Montréal
Reza Bayat
Reza Bayat
University of Montreal, Mila
Artificial IntelligenceMachine learning
I
Ioannis Mitliagkas
Archimedes Unit, Athena Research Center, Athens, Mila & Université de Montréal
J
João Monteiro
Autodesk – work done while at ServiceNow Research
Valentina Zantedeschi
Valentina Zantedeschi
ServiceNow, Laval University
Machine Learning