Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones

📅 2024-12-26

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large and small models struggle to jointly achieve high accuracy and efficiency under identical computational budgets. Method: We propose Performance-Controlled Early Exit (PCEE), a novel early-exit mechanism that replaces conventional single-threshold confidence-based decisions with a joint calibration framework leveraging validation-set ensemble accuracy and confidence scores—enabling fine-grained, controllable accuracy tuning. PCEE integrates dynamic computation allocation with an early-exit architecture, optimizing inference paths without increasing FLOPs. Results: On BERT-large, PCEE achieves a 2.1% accuracy gain over BERT-base under the same computational budget, reduces inference latency by 37%, and maintains precision control within ±0.3%. This work is the first to systematically demonstrate that “large model + early exit” can outperform small models end-to-end while mitigating overconfidence-induced performance degradation in large models.

Technology Category

Application Category

📝 Abstract

Early Exiting (EE) is a promising technique for speeding up inference by adaptively allocating compute resources to data points based on their difficulty. The approach enables predictions to exit at earlier layers for simpler samples while reserving more computation for challenging ones. In this study, we first present a novel perspective on the EE approach, showing that larger models deployed with EE can achieve higher performance than smaller models while maintaining similar computational costs. As existing EE approaches rely on confidence estimation at each exit point, we further study the impact of overconfidence on the controllability of the compute-performance trade-off. We introduce Performance Control Early Exiting (PCEE), a method that enables accuracy thresholding by basing decisions not on a data point's confidence but on the average accuracy of samples with similar confidence levels from a held-out validation set. In our experiments, we show that PCEE offers a simple yet computationally efficient approach that provides better control over performance than standard confidence-based approaches, and allows us to scale up model sizes to yield performance gain while reducing the computational cost.

Problem

Research questions and friction points this paper is trying to address.

Early Exit Strategy

Model Deployment

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

PCEE

Performance Control

Efficiency Improvement

🔎 Similar Papers

Tiny Models are the Computational Saver for Large Models