Healing Powers of BERT: How Task-Specific Fine-Tuning Recovers Corrupted Language Models

📅 2024-06-20
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the recoverability of BERT-style models under hierarchical parameter corruption—specifically, Gaussian noise injection or zeroing—applied layer-wise. Method: We systematically perturb individual transformer layers and evaluate downstream performance degradation and recovery via task-specific fine-tuning on the GLUE benchmark. Contribution/Results: We identify, for the first time, that corruption of lower-layer linguistic representation modules induces irreversible performance loss, dominating overall model robustness. Standard fine-tuning recovers only a fraction of the degraded performance, with recovery efficacy decaying nonlinearly as corruption intensity increases. To formalize these observations, we propose a fine-grained corruption-recovery attribution framework that quantifies inter-layer sensitivity, revealing critical boundaries between pre-trained model adaptability and interference resilience. Our empirical findings and theoretical analysis provide actionable insights and foundational principles for designing robust NLP systems.

Technology Category

Application Category

📝 Abstract
Language models like BERT excel at sentence classification tasks due to extensive pre-training on general data, but their robustness to parameter corruption is unexplored. To understand this better, we look at what happens if a language model is"broken", in the sense that some of its parameters are corrupted and then recovered by fine-tuning. Strategically corrupting BERT variants at different levels, we find corrupted models struggle to fully recover their original performance, with higher corruption causing more severe degradation. Notably, bottom-layer corruption affecting fundamental linguistic features is more detrimental than top-layer corruption. Our insights contribute to understanding language model robustness and adaptability under adverse conditions, informing strategies for developing resilient NLP systems against parameter perturbations.
Problem

Research questions and friction points this paper is trying to address.

Investigates BERT's recovery from parameter corruption via fine-tuning
Assesses performance degradation under varying corruption levels
Compares impact of bottom-layer vs top-layer corruption
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning recovers corrupted BERT parameters
Strategic corruption tests model robustness
Bottom-layer corruption harms performance most
🔎 Similar Papers
No similar papers found.
S
Shijie Han
Columbia University
Z
Zhenyu Zhang
Zhejiang University
A
Andrei Arsene Simion
Columbia University