Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study addresses the pronounced sensitivity of large language models to prompt ordering in multiple-choice questions, a phenomenon whose underlying mechanism remains unclear. Through systematic architectural analysis, controlled experiments, and attention visualization, we reveal that the causal attention mask—when prompts follow the Question-Option-Context (QOC) order—prevents options from attending to the context, creating an information bottleneck that fundamentally degrades performance. We demonstrate that reordering prompts into Context-Question-Option (CQO) consistently alleviates this issue, yielding an average accuracy improvement of over 14 percentage points across diverse models and datasets. Our findings identify the architectural origin of prompt-order sensitivity and offer a simple yet effective remedy to enhance model robustness in multiple-choice reasoning tasks.

Technology Category

Application Category

📝 Abstract

Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) by over 14%p, consistently over a wide range of models and datasets. Through systematic architectural analysis, we identify causal attention as the core mechanism: in QOC prompts, the causal mask prevents option tokens from attending to context, creating an information bottleneck where context becomes invisible to options.

Problem

Research questions and friction points this paper is trying to address.

prompt order

causal attention

language models

multiple-choice question answering

information bottleneck

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal attention

prompt order

information bottleneck