Failure Prediction Is a Better Performance Proxy for Early-Exit Networks Than Calibration

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Widely adopted calibration metrics in early-exit networks suffer from a fundamental flaw: well-calibrated intermediate classifiers do not necessarily improve inference efficiency, and their sample ranking is highly sensitive to calibration perturbations, leading to distorted performance evaluation. Method: We propose *failure prediction*—the first principled proxy metric for early-exit performance—that models the probability of classification failure at each exit, thereby dynamically capturing the interplay between ranking stability and computational savings. Instead of calibrating output confidences, our method directly optimizes the discriminative capability of exit layers in identifying erroneous decisions. Contribution/Results: Extensive experiments across multiple benchmark datasets demonstrate that our approach significantly improves throughput and accuracy–latency trade-offs. Notably, it enables superior early-exit decisions even on uncalibrated models, validating failure prediction’s stronger empirical correlation with actual efficiency and greater generalizability compared to conventional calibration-based metrics.

Technology Category

Application Category

📝 Abstract

Early-exit models speed up inference by attaching internal classifiers to intermediate layers of the model and allowing computation to stop once a prediction satisfies an exit criterion. Most early-exit methods rely on confidence-based exit strategies, which motivated some works to calibrate intermediate classifiers to improve the performance of the entire model. In this paper, we show that calibration measures can be misleading indicators of the performance of multi-exit models: a well-calibrated classifier may still waste computation, and common calibration methods do not preserve the sample ranking within a classifier. We demonstrate empirical cases where miscalibrated networks outperform calibrated ones. As an alternative, we propose to use failure prediction as a more useful proxy for early-exit model performance. Unlike calibration, failure prediction accounts for changes in the ranking of samples and shows a strong correlation with efficiency improvements, making it a more dependable basis for designing and evaluating early-exit models.

Problem

Research questions and friction points this paper is trying to address.

Evaluating calibration as performance proxy for early-exit networks

Identifying misleading indicators in multi-exit model efficiency

Proposing failure prediction as better performance evaluation metric

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses failure prediction as performance proxy

Replaces confidence-based calibration methods

Focuses on sample ranking for efficiency

🔎 Similar Papers

Joint or Disjoint: Mixing Training Regimes for Early-Exit Models