Trick or Neat: Adversarial Ambiguity and Language Model Evaluation

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the sensitivity and robustness of large language models (LLMs) to syntactic, lexical, and phonological ambiguity—three fundamental sources of linguistic uncertainty. Method: We propose the first adversarial evaluation paradigm covering all three ambiguity types, constructing a high-quality adversarial ambiguity dataset via word-order perturbations, synonym substitutions, and random modifications. We further introduce a multi-dimensional ambiguity annotation and evaluation framework. Using layer-wise representation analysis and linear probing, we probe how ambiguity information is encoded across model layers. Contribution/Results: We首次 demonstrate that LLMs encode ambiguity with high fidelity in intermediate hidden layers (decoding accuracy >90%), while standard prompting fails catastrophically—challenging prevailing assumptions about prompt engineering efficacy. Ambiguity representations exhibit systematic layer-wise distribution patterns. We publicly release the dataset and code, establishing a new benchmark for modeling linguistic uncertainty and advancing LLM robustness research.

Technology Category

Application Category

📝 Abstract
Detecting ambiguity is important for language understanding, including uncertainty estimation, humour detection, and processing garden path sentences. We assess language models' sensitivity to ambiguity by introducing an adversarial ambiguity dataset that includes syntactic, lexical, and phonological ambiguities along with adversarial variations (e.g., word-order changes, synonym replacements, and random-based alterations). Our findings show that direct prompting fails to robustly identify ambiguity, while linear probes trained on model representations can decode ambiguity with high accuracy, sometimes exceeding 90%. Our results offer insights into the prompting paradigm and how language models encode ambiguity at different layers. We release both our code and data: https://github.com/coastalcph/lm_ambiguity.
Problem

Research questions and friction points this paper is trying to address.

Assessing language models' sensitivity to ambiguity
Evaluating adversarial ambiguity detection methods
Analyzing how models encode ambiguity across layers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial ambiguity dataset creation
Linear probes for ambiguity decoding
Layer-wise ambiguity encoding analysis
🔎 Similar Papers
No similar papers found.