Confidence-gated training for efficient early-exit neural networks

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Early-exit neural networks suffer from gradient interference during joint training, where deeper classifiers dominate shallow exits and undermine their effectiveness. To address this, we propose Confidence-Gated Training (CGT), a novel training paradigm that introduces conditional gradient propagation: gradients are backpropagated to deeper layers only when the current layer’s prediction confidence falls below a learnable threshold—thereby aligning training dynamics with inference-time early-exit behavior. CGT explicitly grants shallow classifiers priority in decision-making, reducing redundant computation. Extensive experiments on benchmarks including Indian Pines and Fashion-MNIST demonstrate that CGT significantly reduces average inference latency (up to 42%) while improving overall accuracy (+0.8%–1.3%) and early-exit accuracy (+3.1%–5.7%). These gains enhance deployment efficiency in resource-constrained environments without architectural modifications.

Technology Category

Application Category

📝 Abstract
Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers. However, joint training often leads to gradient interference, with deeper classifiers dominating optimization. We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail. This encourages shallow classifiers to act as primary decision points while reserving deeper layers for harder inputs. By aligning training with the inference-time policy, CGT mitigates overthinking, improves early-exit accuracy, and preserves efficiency. Experiments on the Indian Pines and Fashion-MNIST benchmarks show that CGT lowers average inference cost while improving overall accuracy, offering a practical solution for deploying deep models in resource-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Early-exit networks suffer from gradient interference during joint training
Deeper classifiers dominate optimization over shallow exit points
Training and inference policies are misaligned, reducing efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditionally propagates gradients from deeper exits
Encourages shallow classifiers as primary decision points
Aligns training with inference-time early-exit policy
🔎 Similar Papers
S
Saad Mokssit
International University of Rabat, TICLab, Morocco
O
Ouassim Karrakchou
International University of Rabat, TICLab, Morocco
A
Alejandro Mousist
Thales Alenia Space, Tres Cantos, Spain
Mounir Ghogho
Mounir Ghogho
University Mohammed VI Polytechnic
Machine LearningSignal ProcessingWireless Communication