Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization

šŸ“… 2025-03-19
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Balancing generalization and robustness remains a fundamental challenge for language models. This paper proposes UEGR, a two-stage optimization framework: during forward propagation, adaptive Dropout generates diverse submodels, while joint Jensen–Shannon divergence regularization and adversarial loss enhance output stability; during backward propagation, sparse gradient updates are performed based on parameter significance. UEGR is the first method to unify generalization and robustness improvement within a single theoretical framework, provably achieving both gradient regularization—reducing sensitivity to input perturbations—and loss landscape flattening—improving generalization. Evaluated on 13 standard natural language understanding benchmarks, UEGR consistently outperforms state-of-the-art methods, achieving new SOTA performance on both generalization and robustness metrics.

Technology Category

Application Category

šŸ“ Abstract
Neural network language models (LMs) are confronted with significant challenges in generalization and robustness. Currently, many studies focus on improving either generalization or robustness in isolation, without methods addressing both aspects simultaneously, which presents a significant challenge in developing LMs that are both robust and generalized. In this paper, we propose a bi-stage optimization framework to uniformly enhance both the generalization and robustness of LMs, termed UEGR. Specifically, during the forward propagation stage, we enrich the output probability distributions of adversarial samples by adaptive dropout to generate diverse sub models, and incorporate JS divergence and adversarial losses of these output distributions to reinforce output stability. During backward propagation stage, we compute parameter saliency scores and selectively update only the most critical parameters to minimize unnecessary deviations and consolidate the model's resilience. Theoretical analysis shows that our framework includes gradient regularization to limit the model's sensitivity to input perturbations and selective parameter updates to flatten the loss landscape, thus improving both generalization and robustness. The experimental results show that our method significantly improves the generalization and robustness of LMs compared to other existing methods across 13 publicly available language datasets, achieving state-of-the-art (SOTA) performance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LM generalization and robustness simultaneously
Addressing lack of methods for both robustness and generalization
Proposing bi-stage optimization for stable and resilient LMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-stage optimization for LM generalization and robustness
Adaptive dropout and JS divergence in forward stage
Selective parameter updates in backward stage
šŸ”Ž Similar Papers
No similar papers found.
Y
Yudao Sun
Department of New Networks, Peng Cheng Laboratory, Shenzhen 518055, China
J
Juan Yin
Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore 117576
Juan Zhao
Juan Zhao
Associate Professor of Bioinformatics, Shanghai University of Chinese Traditional Medicine
F
Fan Zhang
Department of New Networks, Peng Cheng Laboratory, Shenzhen 518055, China
Y
Yongheng Liu
Department of New Networks, Peng Cheng Laboratory, Shenzhen 518055, China
H
Hongji Chen
Department of New Networks, Peng Cheng Laboratory, Shenzhen 518055, China