🤖 AI Summary
Early-exit neural networks suffer from gradient interference during joint training, where deeper classifiers dominate shallow exits and undermine their effectiveness. To address this, we propose Confidence-Gated Training (CGT), a novel training paradigm that introduces conditional gradient propagation: gradients are backpropagated to deeper layers only when the current layer’s prediction confidence falls below a learnable threshold—thereby aligning training dynamics with inference-time early-exit behavior. CGT explicitly grants shallow classifiers priority in decision-making, reducing redundant computation. Extensive experiments on benchmarks including Indian Pines and Fashion-MNIST demonstrate that CGT significantly reduces average inference latency (up to 42%) while improving overall accuracy (+0.8%–1.3%) and early-exit accuracy (+3.1%–5.7%). These gains enhance deployment efficiency in resource-constrained environments without architectural modifications.
📝 Abstract
Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers. However, joint training often leads to gradient interference, with deeper classifiers dominating optimization. We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail. This encourages shallow classifiers to act as primary decision points while reserving deeper layers for harder inputs. By aligning training with the inference-time policy, CGT mitigates overthinking, improves early-exit accuracy, and preserves efficiency. Experiments on the Indian Pines and Fashion-MNIST benchmarks show that CGT lowers average inference cost while improving overall accuracy, offering a practical solution for deploying deep models in resource-constrained environments.