🤖 AI Summary
This study addresses the degradation of language model robustness under minor input perturbations. Methodologically, it introduces: (1) three distinct parameter-efficient fine-tuning strategies; (2) a novel integration of chain-of-thought (CoT) prompting with example augmentation to enhance LLM robustness training; and (3) the construction of multi-task benchmarks—including Tabular-NLI—for reproducible, fine-grained robustness evaluation across diverse input perturbations. Experiments demonstrate that the proposed approach significantly improves generalization robustness against multiple perturbation types while preserving original task performance. Crucially, it provides the first empirical validation of cross-perturbation transfer effects—where robustness acquired against one perturbation type transfers to others—thereby offering both theoretical insights and practical optimization paradigms for trustworthy AI. The work establishes the first concurrent perturbation robustness evaluation framework, systematically assessing pre-trained models and large language models under fine-grained input disturbances.
📝 Abstract
Language models, characterized by their black-box nature, often hallucinate and display sensitivity to input perturbations, causing concerns about trust. To enhance trust, it is imperative to gain a comprehensive understanding of the model’s failure modes and develop effective strategies to improve their performance. In this study, we introduce a methodology designed to examine how input perturbations affect language models across various scales, including pre-trained models and large language models (LLMs). Utilizing fine-tuning, we enhance the model’s robustness to input perturbations. Additionally, we investigate whether exposure to one perturbation enhances or diminishes the model’s performance with respect to other perturbations. To address robustness against multiple perturbations, we present three distinct fine-tuning strategies. Furthermore, we broaden the scope of our methodology to encompass large language models (LLMs) by leveraging a chain of thought (CoT) prompting approach augmented with exemplars. We employ the Tabular-NLI task to showcase how our proposed strategies adeptly train a robust model, enabling it to address diverse perturbations while maintaining accuracy on the original dataset.