Know What You Don't Know: Selective Prediction for Early Exit DNNs

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Early-exit (EE) deep neural networks reduce inference latency but suffer from overconfidence, causing premature exits on hard instances and undermining prediction reliability. This paper proposes SPEED, the first framework integrating selective prediction with hierarchical latency-aware classifiers. At each intermediate layer, exit decisions are jointly determined by confidence scores and instance difficulty estimates; only easy instances trigger early exits, while hard instances are dynamically routed to deeper expert modules. This mechanism significantly improves both reliability and efficiency without compromising accuracy: SPEED achieves a 2.05× speedup over full-stack inference and reduces erroneous prediction risk by 50%. Its core innovation lies in a difficulty-aware dynamic exit policy—departing from conventional EE methods that rely solely on fixed confidence thresholds—thereby establishing a new paradigm for trustworthy AI deployment at the edge.

Technology Category

Application Category

📝 Abstract
Inference latency and trustworthiness of Deep Neural Networks (DNNs) are the bottlenecks in deploying them in critical applications like sensitive tasks. Early Exit (EE) DNNs overcome the latency issues by allowing samples to exit from intermediary layers if they attain `high' confidence scores on the predicted class. However, the DNNs are known to exhibit overconfidence, which can lead to many samples exiting early and render EE strategies untrustworthy. We use Selective Prediction (SP) to overcome this issue by checking the `hardness' of the samples rather than just relying on the confidence score alone. We propose SPEED, a novel approach that uses Deferral Classifiers (DCs) at each layer to check the hardness of samples before performing EEs. Specifically, the DCs identify if a sample is hard to predict at an intermediary layer, leading to hallucination, and defer it to an expert. Early detection of hard samples for inference prevents the wastage of computational resources and improves trust by deferring the hard samples to the expert. We demonstrate that EE aided with SP improves both accuracy and latency. Our method minimizes the risk of wrong prediction by $50%$ with a speedup of $2.05 imes$ as compared to the final layer. The anonymized source code is available at https://github.com/Div290/SPEED
Problem

Research questions and friction points this paper is trying to address.

Overcoming overconfidence in early exit DNNs
Selective prediction for trustworthy early exits
Reducing computational waste on hard samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deferral Classifiers at each layer
Selective Prediction for Early Exit
Hardness-based sample deferral to experts
🔎 Similar Papers