Trick or Neat: Adversarial Ambiguity and Language Model Evaluation

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This study systematically evaluates the sensitivity and robustness of large language models (LLMs) to syntactic, lexical, and phonological ambiguity—three fundamental sources of linguistic uncertainty. Method: We propose the first adversarial evaluation paradigm covering all three ambiguity types, constructing a high-quality adversarial ambiguity dataset via word-order perturbations, synonym substitutions, and random modifications. We further introduce a multi-dimensional ambiguity annotation and evaluation framework. Using layer-wise representation analysis and linear probing, we probe how ambiguity information is encoded across model layers. Contribution/Results: We首次 demonstrate that LLMs encode ambiguity with high fidelity in intermediate hidden layers (decoding accuracy >90%), while standard prompting fails catastrophically—challenging prevailing assumptions about prompt engineering efficacy. Ambiguity representations exhibit systematic layer-wise distribution patterns. We publicly release the dataset and code, establishing a new benchmark for modeling linguistic uncertainty and advancing LLM robustness research.

Technology Category

Application Category

📝 Abstract

Detecting ambiguity is important for language understanding, including uncertainty estimation, humour detection, and processing garden path sentences. We assess language models' sensitivity to ambiguity by introducing an adversarial ambiguity dataset that includes syntactic, lexical, and phonological ambiguities along with adversarial variations (e.g., word-order changes, synonym replacements, and random-based alterations). Our findings show that direct prompting fails to robustly identify ambiguity, while linear probes trained on model representations can decode ambiguity with high accuracy, sometimes exceeding 90%. Our results offer insights into the prompting paradigm and how language models encode ambiguity at different layers. We release both our code and data: https://github.com/coastalcph/lm_ambiguity.

Problem

Research questions and friction points this paper is trying to address.

Assessing language models' sensitivity to ambiguity

Evaluating adversarial ambiguity detection methods

Analyzing how models encode ambiguity across layers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial ambiguity dataset creation

Linear probes for ambiguity decoding

Layer-wise ambiguity encoding analysis

🔎 Similar Papers

Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation