🤖 AI Summary
In large-scale pre-trained model inference with early-exit mechanisms, the entanglement of representational and discriminative capabilities in shallow-layer features limits performance. Method: We propose a decoupled multi-predictor optimization framework that, for the first time, separates representational capacity (i.e., foundational feature reconstruction) from discriminative capacity (i.e., task-specific decision-making) at both architectural and training levels. Our approach introduces lightweight bypass modules and higher-order statistical predictors, coupled with a two-stage dynamic loss weighting strategy. It supports parameter-efficient fine-tuning and input-adaptive early exit during inference. Contribution/Results: Evaluated across multiple benchmarks and mainstream pre-trained models, our method significantly outperforms existing early-exit approaches—reducing computational cost by 30%–50% while improving accuracy. This demonstrates that explicit representational-discriminative decoupling effectively enhances both inference efficiency and predictive accuracy in tandem.
📝 Abstract
Recently, remarkable progress has been made in large-scale pre-trained model tuning, and inference efficiency is becoming more crucial for practical deployment. Early exiting in conjunction with multi-stage predictors, when cooperated with a parameter-efficient fine-tuning strategy, offers a straightforward way to achieve an inference-efficient model. However, a key challenge remains unresolved: How can early stages provide low-level fundamental features to deep stages while simultaneously supplying high-level discriminative features to early-stage predictors? To address this problem, we propose a Decoupled Multi-Predictor Optimization (DMPO) method to effectively decouple the low-level representative ability and high-level discriminative ability in early stages. First, in terms of architecture, we introduce a lightweight bypass module into multi-stage predictors for functional decomposition of shallow features from early stages, while a high-order statistics-based predictor is developed for early stages to effectively enhance their discriminative ability. To reasonably train our multi-predictor architecture, a decoupled optimization is proposed to allocate two-phase loss weights for multi-stage predictors during model tuning, where the initial training phase enables the model to prioritize the acquisition of discriminative ability of deep stages via emphasizing representative ability of early stages, and the latter training phase drives discriminative ability towards earlier stages as much as possible. As such, our DMPO can effectively decouple representative and discriminative abilities in early stages in terms of architecture design and model optimization. Experiments across various datasets and pre-trained backbones demonstrate that DMPO clearly outperforms its counterparts when reducing computational cost.