Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language models (LLMs) lack awareness of ambiguity in user queries and often produce confidently incorrect answers. Method: This work is the first to identify that shallow-layer sparse neurons during the prefilling phase linearly encode question ambiguity; leveraging internal representation analysis, linear probing, and neuron-level intervention, we precisely localize and manipulate these neurons. Contribution/Results: A single identified neuron suffices for high-accuracy, cross-dataset generalizable ambiguity detection—substantially outperforming baselines including prompt engineering and representation similarity metrics. Moreover, targeted suppression of this neuron reliably induces model refusal behavior, demonstrating a compact, interpretable, and intervenable pathway for ambiguity perception and control. This reveals a fundamental, low-dimensional mechanism underlying LLMs’ ambiguity sensitivity and offers a principled approach to enhancing their reliability through direct neural intervention.

Technology Category

Application Category

📝 Abstract

Ambiguity is pervasive in real-world questions, yet large language models (LLMs) often respond with confident answers rather than seeking clarification. In this work, we show that question ambiguity is linearly encoded in the internal representations of LLMs and can be both detected and controlled at the neuron level. During the model's pre-filling stage, we identify that a small number of neurons, as few as one, encode question ambiguity information. Probes trained on these Ambiguity-Encoding Neurons (AENs) achieve strong performance on ambiguity detection and generalize across datasets, outperforming prompting-based and representation-based baselines. Layerwise analysis reveals that AENs emerge from shallow layers, suggesting early encoding of ambiguity signals in the model's processing pipeline. Finally, we show that through manipulating AENs, we can control LLM's behavior from direct answering to abstention. Our findings reveal that LLMs form compact internal representations of question ambiguity, enabling interpretable and controllable behavior.

Problem

Research questions and friction points this paper is trying to address.

Detecting question ambiguity in LLM neuron signals

Controlling LLM behavior from answering to abstention

Identifying sparse neurons encoding ambiguity information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuron-level ambiguity detection and control

Probes on Ambiguity-Encoding Neurons outperform baselines

Manipulating sparse neurons controls answering behavior

🔎 Similar Papers

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models