Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the pronounced sensitivity of large language models to prompt ordering in multiple-choice questions, a phenomenon whose underlying mechanism remains unclear. Through systematic architectural analysis, controlled experiments, and attention visualization, we reveal that the causal attention mask—when prompts follow the Question-Option-Context (QOC) order—prevents options from attending to the context, creating an information bottleneck that fundamentally degrades performance. We demonstrate that reordering prompts into Context-Question-Option (CQO) consistently alleviates this issue, yielding an average accuracy improvement of over 14 percentage points across diverse models and datasets. Our findings identify the architectural origin of prompt-order sensitivity and offer a simple yet effective remedy to enhance model robustness in multiple-choice reasoning tasks.

Technology Category

Application Category

📝 Abstract
Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) by over 14%p, consistently over a wide range of models and datasets. Through systematic architectural analysis, we identify causal attention as the core mechanism: in QOC prompts, the causal mask prevents option tokens from attending to context, creating an information bottleneck where context becomes invisible to options.
Problem

Research questions and friction points this paper is trying to address.

prompt order
causal attention
language models
multiple-choice question answering
information bottleneck
Innovation

Methods, ideas, or system contributions that make the work stand out.

causal attention
prompt order
information bottleneck
language models
context visibility
🔎 Similar Papers
No similar papers found.
H
Hyunjong Ok
POSTECH, HJ AILAB
Jaeho Lee
Jaeho Lee
POSTECH
Machine LearningEfficient AI