Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit significant sensitivity to character- and word-level perturbations in task instructions; existing work primarily addresses data-level robustness, neglecting instruction-level resilience. Method: We present the first systematic investigation and enhancement of LLMs’ stability and generalization under perturbed instructions. Our approach introduces a lightweight, annotation-free self-denoising strategy that requires no additional supervision and is compatible with both frozen-weight inference and fine-tuning paradigms. Contribution/Results: Evaluated across diverse models—including Llama-3 and Flan-T5—and benchmarks—CoLA, QNLI, SST-2—the method achieves substantial average gains in downstream accuracy. It demonstrates strong robustness against spelling errors, synonym substitutions, and word-order permutations. Our work establishes a new paradigm for instruction robustness research and delivers a practical, scalable solution for improving LLM reliability in real-world instruction variations.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are highly vulnerable to input perturbations, as even a small prompt change may result in a substantially different output. Existing methods to enhance LLM robustness are primarily focused on perturbed data samples, whereas improving resiliency to perturbations of task-level instructions has remained relatively underexplored. In this work, we focus on character- and word-level edits of task-specific instructions, which substantially degrade downstream performance. We experiment with a variety of techniques to enhance the robustness of LLMs, including self-denoising and representation alignment, testing different models (Llama 3 and Flan-T5), datasets (CoLA, QNLI, SST-2) and instructions (both task-oriented and role-oriented). We find that, on average, self-denoising -- whether performed by a frozen LLM or a fine-tuned model -- achieves substantially higher performance gains than alternative strategies, including more complex baselines such as ensembling and supervised methods.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM robustness to perturbed instructions
Addressing performance drop from instruction-level perturbations
Comparing self-denoising with other robustness techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-denoising enhances LLM robustness
Representation alignment improves instruction resilience
Character-word edits test perturbation impact
🔎 Similar Papers
No similar papers found.