Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language models (LLMs) exhibit significant sensitivity to character- and word-level perturbations in task instructions; existing work primarily addresses data-level robustness, neglecting instruction-level resilience. Method: We present the first systematic investigation and enhancement of LLMs’ stability and generalization under perturbed instructions. Our approach introduces a lightweight, annotation-free self-denoising strategy that requires no additional supervision and is compatible with both frozen-weight inference and fine-tuning paradigms. Contribution/Results: Evaluated across diverse models—including Llama-3 and Flan-T5—and benchmarks—CoLA, QNLI, SST-2—the method achieves substantial average gains in downstream accuracy. It demonstrates strong robustness against spelling errors, synonym substitutions, and word-order permutations. Our work establishes a new paradigm for instruction robustness research and delivers a practical, scalable solution for improving LLM reliability in real-world instruction variations.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are highly vulnerable to input perturbations, as even a small prompt change may result in a substantially different output. Existing methods to enhance LLM robustness are primarily focused on perturbed data samples, whereas improving resiliency to perturbations of task-level instructions has remained relatively underexplored. In this work, we focus on character- and word-level edits of task-specific instructions, which substantially degrade downstream performance. We experiment with a variety of techniques to enhance the robustness of LLMs, including self-denoising and representation alignment, testing different models (Llama 3 and Flan-T5), datasets (CoLA, QNLI, SST-2) and instructions (both task-oriented and role-oriented). We find that, on average, self-denoising -- whether performed by a frozen LLM or a fine-tuned model -- achieves substantially higher performance gains than alternative strategies, including more complex baselines such as ensembling and supervised methods.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM robustness to perturbed instructions

Addressing performance drop from instruction-level perturbations

Comparing self-denoising with other robustness techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-denoising enhances LLM robustness

Representation alignment improves instruction resilience

Character-word edits test perturbation impact

🔎 Similar Papers

No similar papers found.