Towards LLMs Robustness to Changes in Prompt Format Styles

📅 2025-04-09

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Large language models (LLMs) exhibit pronounced sensitivity to non-semantic variations in prompt formatting—termed “prompt fragility”—leading to substantial performance fluctuations. To address this, we propose Mix-of-Formats (MOF), the first method to adapt style disentanglement—a technique originally developed in computer vision—to prompt robustness research. MOF constructs few-shot prompts via style-diverse sampling and explicitly decouples semantic content from formatting dimensions during training. We introduce a cross-format generalization evaluation framework and validate MOF across multiple benchmarks. Results show that MOF significantly enhances LLM robustness against formatting perturbations, reducing performance variance by 47% on average and improving accuracy by 2.1%. Crucially, MOF requires no model fine-tuning or additional parameters, offering a scalable, lightweight paradigm for robust prompt engineering.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have gained popularity in recent years for their utility in various applications. However, they are sensitive to non-semantic changes in prompt formats, where small changes in the prompt format can lead to significant performance fluctuations. In the literature, this problem is commonly referred to as prompt brittleness. Previous research on prompt engineering has focused mainly on developing techniques for identifying the optimal prompt for specific tasks. Some studies have also explored the issue of prompt brittleness and proposed methods to quantify performance variations; however, no simple solution has been found to address this challenge. We propose Mixture of Formats (MOF), a simple and efficient technique for addressing prompt brittleness in LLMs by diversifying the styles used in the prompt few-shot examples. MOF was inspired by computer vision techniques that utilize diverse style datasets to prevent models from associating specific styles with the target variable. Empirical results show that our proposed technique reduces style-induced prompt brittleness in various LLMs while also enhancing overall performance across prompt variations and different datasets.

Problem

Research questions and friction points this paper is trying to address.

LLMs' sensitivity to non-semantic prompt format changes

Addressing prompt brittleness in large language models

Reducing performance fluctuations from varied prompt styles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Formats (MOF) diversifies prompt styles

MOF reduces style-induced prompt brittleness

MOF enhances performance across prompt variations

🔎 Similar Papers

LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs