🤖 AI Summary
Static confidence thresholds in early-exit mechanisms often induce overconfident misclassifications and exhibit poor robustness to distribution shifts. To address this, we propose a dynamic adaptive exit framework grounded in multi-armed bandits (MAB), which learns optimal exit thresholds online. Our approach introduces a novel reward function that jointly accounts for prediction confidence and reliability, enabling uncertainty-aware latency–accuracy trade-offs. By integrating unsupervised online learning with uncertainty modeling, the framework continuously refines exit policies without requiring ground-truth labels. Evaluated on vision-language understanding, text generation, and classification tasks, our method achieves 1.70–2.10× inference speedup with less than 2% accuracy degradation—substantially outperforming static-threshold baselines. The framework thus delivers superior efficiency, reliability, and deployment robustness under distributional shifts.
📝 Abstract
Early-Exit Deep Neural Networks enable adaptive inference by allowing prediction at intermediary layers, significantly reducing computational costs and latency. Most of the early exit strategies greedily exit a sample at an intermediary layer if the confidence in class prediction exceeds a predefined threshold that is set using a static validation set. This is problematic as the model might be overconfident in a wrong class. Also, they are not robust to distribution shifts encountered in deployment, which can undermine model trustworthiness and accuracy. To address these challenges, we propose UAT that adapts the threshold for exit decisions using a Multi-Armed Bandit framework, enabling online, unsupervised adjustment of exit decisions. UAT makes decisions based on a new reward function that assesses predictive certainty and its reliability to balance computational efficiency and prediction quality while penalizing unnecessary late exits. We provide guarantees on risk achieved by UAT and validate its performance on diverse tasks spanning vision-language understanding, text generation, and classification. Our framework demonstrates consistent improvements in speedup (1.70-2.10x) with a minimal performance drop (<2%) as compared to full model performance. Our source code is available at https://github.com/Div290/UAT.