Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the sensitivity of large language models to minor input perturbations, a challenge inadequately handled by existing methods that enforce sequence-level consistency and fail to capture localized semantic drifts. To this end, the authors propose the S²R² framework, which introduces a segment-level robustness mechanism during LoRA fine-tuning. S²R² decomposes outputs into semantic segments and aligns clean and perturbed generations via optimal transport, selectively penalizing segments exhibiting maximal drift. Additionally, it incorporates an adapter stability regularizer motivated by attention reallocation, constraining LoRA norms to mitigate evidence shift. Theoretical analysis from a PAC-Bayesian perspective reveals that controlling adapter growth enhances generalization across perturbations. Experiments demonstrate that S²R² significantly improves robustness against spelling errors, deletions, synonym substitutions, and paraphrasing in summarization tasks, while preserving strong performance on clean inputs and outperforming consistency-based baselines in cross-dataset transfer.

📝 Abstract

Large language models are sensitive to minor prompt perturbations, yet existing robustness methods usually enforce consistency at the whole-sequence level. This holistic view can hide an important failure mode: a perturbed response may remain globally similar to the clean one while drifting on a critical entity, relation, or conclusion. We introduce S$^2$R$^2$, a segment-level framework for robust LoRA fine-tuning. S$^2$R$^2$ decomposes clean and perturbed generations into semantic segments, aligns them with an optimal-transport objective, and penalises the segments with the largest meaning drift. To connect this output-side objective with model adaptation, we add an adapter-stability regulariser motivated by segment-level attention reallocation, using LoRA norm control as a tractable proxy for limiting perturbation-amplified evidence shifts. A PAC-Bayesian complexity view further explains why controlling adapter growth may support transfer beyond observed perturbations. Experiments on summarisation benchmarks show that S$^2$R$^2$ improves robustness under typographical noise, deletion, synonym replacement, and paraphrasing, while maintaining competitive clean performance and stronger cross-dataset transfer than consistency-based baselines.

Problem

Research questions and friction points this paper is trying to address.

prompt perturbations

robustness

semantic drift

LoRA-tuned language models

segment-level analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

segment-level robustness

LoRA fine-tuning

prompt perturbation