Spurious Prompts: Can Irrelevant Prompts Steer Large Language Models?

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

This study investigates whether spurious, task-irrelevant prompts can systematically steer the behavior of large language models. To this end, the authors propose a black-box search method to automatically discover such prompts and evaluate their impact across multiple reasoning and question-answering benchmarks. Experiments span three model families with parameter counts ranging from 0.8B to 27B, revealing for the first time that spurious prompts can significantly enhance model performance—matching or even surpassing conventional prompting strategies—while also inducing unintended behaviors, such as a preference for the first answer option or the repeated generation of specific numerical values. These findings broaden our understanding of prompt sensitivity in large language models and highlight the risk that their behavior can be manipulated by non-semantic cues.

📝 Abstract

Large language models are highly sensitive to prompts, but this sensitivity is usually studied through task-relevant instructions, demonstrations, or reasoning cues. In this paper, we study a different form of prompt sensitivity: whether prompts that are semantically unrelated to the task can nevertheless steer model behavior. We call them spurious prompts and show their surprising efficacy. We also propose a simple black-box search procedure for discovering them. Across reasoning and question-answering benchmarks, using models ranging from 0.8B to 27B parameters and spanning three model families, we show that spurious prompts can improve performance, often matching or outperforming standard prompting baselines and task-aware prompt optimization. We further show that they can steer models toward unintended behaviors, such as repeatedly selecting the first answer option, producing incorrect answers, returning an even, prime or small number without explicitly instructing the model to do so. These findings reveal a new kind of prompt sensitivity: LLMs can be systematically steered by prompts that are unrelated to the task they are asked to solve. Our code is available at https://github.com/Batorskq/spurious

Problem

Research questions and friction points this paper is trying to address.

spurious prompts

prompt sensitivity

large language models

unintended behaviors

task-irrelevant prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

spurious prompts

prompt sensitivity

black-box search