🤖 AI Summary
This study investigates the recoverability of BERT-style models under hierarchical parameter corruption—specifically, Gaussian noise injection or zeroing—applied layer-wise. Method: We systematically perturb individual transformer layers and evaluate downstream performance degradation and recovery via task-specific fine-tuning on the GLUE benchmark. Contribution/Results: We identify, for the first time, that corruption of lower-layer linguistic representation modules induces irreversible performance loss, dominating overall model robustness. Standard fine-tuning recovers only a fraction of the degraded performance, with recovery efficacy decaying nonlinearly as corruption intensity increases. To formalize these observations, we propose a fine-grained corruption-recovery attribution framework that quantifies inter-layer sensitivity, revealing critical boundaries between pre-trained model adaptability and interference resilience. Our empirical findings and theoretical analysis provide actionable insights and foundational principles for designing robust NLP systems.
📝 Abstract
Language models like BERT excel at sentence classification tasks due to extensive pre-training on general data, but their robustness to parameter corruption is unexplored. To understand this better, we look at what happens if a language model is"broken", in the sense that some of its parameters are corrupted and then recovered by fine-tuning. Strategically corrupting BERT variants at different levels, we find corrupted models struggle to fully recover their original performance, with higher corruption causing more severe degradation. Notably, bottom-layer corruption affecting fundamental linguistic features is more detrimental than top-layer corruption. Our insights contribute to understanding language model robustness and adaptability under adverse conditions, informing strategies for developing resilient NLP systems against parameter perturbations.