🤖 AI Summary
This study investigates the interaction between positional bias and linguistic diversity in large language models (LLMs), challenging the prevailing assumption that earlier tokens are more reliable. Method: Using entropy analysis, controlled positional prompting experiments, cross-lingual syntactic alignment evaluation, and systematic prompt perturbations, we examine Qwen2.5-7B across five typologically diverse languages—English, Russian, German, Hindi, and Vietnamese. Contribution/Results: We identify a late-position preference in Qwen2.5-7B, contrary to dominant early-position assumptions; demonstrate that explicit positional cues degrade accuracy; and reveal that LLMs impose rigid dominant word order even in free-word-order languages—e.g., enforcing SOV in Hindi despite its flexibility. We quantify language-specific positional bias patterns, empirically falsifying the “earlier is better” consensus. Moreover, we establish a non-monotonic relationship between uncertainty and positional bias, offering novel empirical foundations for robust prompt engineering and model calibration.
📝 Abstract
Large language models exhibit positional bias -- systematic neglect of information at specific context positions -- yet its interplay with linguistic diversity remains poorly understood. We present a cross-linguistic study across five typologically distinct languages (English, Russian, German, Hindi, Vietnamese), examining how positional bias interacts with model uncertainty, syntax, and prompting. Key findings: (1) Positional bias is model-driven, with language-specific variations -- Qwen2.5-7B favors late positions, challenging assumptions of early-token bias; (2) Explicit positional guidance (e.g., correct context is at position X) reduces accuracy across languages, undermining prompt-engineering practices; (3) Aligning context with positional bias increases entropy, yet minimal entropy does not predict accuracy. (4) We further uncover that LLMs differently impose dominant word order in free-word-order languages like Hindi.