Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge of catastrophic forgetting in sequentially trained early-exit neural networks, where newly added exits often disrupt previously learned classifiers, degrading their performance. To mitigate this issue, the authors introduce a stability-plasticity trade-off into early-exit architectures through two complementary mechanisms: first, Elastic Weight Consolidation (EWC) is applied at the parameter level to protect critical weights of existing exits; second, Learning without Forgetting (LwF) is employed at the output level to preserve the output distributions of earlier exits. Evaluated on standard benchmarks, the proposed approach significantly outperforms existing sequential training strategies, achieving higher accuracy and improved inference efficiency with minimal computational overhead, thereby effectively alleviating catastrophic forgetting in sequential early-exit network training.

📝 Abstract

Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them to a shared backbone; however, this sequential training can cause newly introduced exits to interfere with previously learned ones, degrading the performance of earlier classifiers. We address this problem by retaining the knowledge embedded in existing exits while allowing new ones to specialize. We propose two alternative approaches that operate at different levels of the model. The first constrains learning by protecting parameters that are important for previously trained exits, while the second preserves the output distributions of earlier exits as the network adapts. These alternatives directly reflect the stability-plasticity trade-off studied in continual learning. Accordingly, we leverage \textit{Elastic Weight Consolidation} to constrain critical weights and \textit{Learning without Forgetting} to preserve output distributions. Experiments on standard benchmarks show that our approaches consistently improve early-exit performance, achieving higher accuracy over existing sequential training methods and significant performance speedups at low computational budgets.

Problem

Research questions and friction points this paper is trying to address.

stability-plasticity trade-off

early-exiting neural networks

sequential training

catastrophic interference

continual learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Early-exiting neural networks

Sequential training

Stability-plasticity trade-off