Confidence-gated training for efficient early-exit neural networks

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

Early-exit neural networks suffer from gradient interference during joint training, where deeper classifiers dominate shallow exits and undermine their effectiveness. To address this, we propose Confidence-Gated Training (CGT), a novel training paradigm that introduces conditional gradient propagation: gradients are backpropagated to deeper layers only when the current layer’s prediction confidence falls below a learnable threshold—thereby aligning training dynamics with inference-time early-exit behavior. CGT explicitly grants shallow classifiers priority in decision-making, reducing redundant computation. Extensive experiments on benchmarks including Indian Pines and Fashion-MNIST demonstrate that CGT significantly reduces average inference latency (up to 42%) while improving overall accuracy (+0.8%–1.3%) and early-exit accuracy (+3.1%–5.7%). These gains enhance deployment efficiency in resource-constrained environments without architectural modifications.

Technology Category

Application Category

📝 Abstract

Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers. However, joint training often leads to gradient interference, with deeper classifiers dominating optimization. We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail. This encourages shallow classifiers to act as primary decision points while reserving deeper layers for harder inputs. By aligning training with the inference-time policy, CGT mitigates overthinking, improves early-exit accuracy, and preserves efficiency. Experiments on the Indian Pines and Fashion-MNIST benchmarks show that CGT lowers average inference cost while improving overall accuracy, offering a practical solution for deploying deep models in resource-constrained environments.

Problem

Research questions and friction points this paper is trying to address.

Early-exit networks suffer from gradient interference during joint training

Deeper classifiers dominate optimization over shallow exit points

Training and inference policies are misaligned, reducing efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditionally propagates gradients from deeper exits

Encourages shallow classifiers as primary decision points

Aligns training with inference-time early-exit policy

🔎 Similar Papers

Joint or Disjoint: Mixing Training Regimes for Early-Exit Models