Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical confound in interpreting superposition in neural representations: existing metrics misattribute activation overlap between homonyms—such as “bank” denoting either a financial institution or a riverside—to genuine concept compression, conflating lexical form with semantic distinction. Through a systematic 2×2 factorial design that disentangles lexical identity from semantic content, the work demonstrates for the first time that polysemy, rather than true superposition, is the primary driver of observed activation overlap. Experiments across models ranging from 110M to 70B parameters reveal that 18–36% of sparse autoencoder features conflate distinct word senses, predominantly within the top ≤1% most active dimensions. Removing this confound significantly improves word-sense disambiguation performance and enhances the selectivity of knowledge editing interventions (p=0.002).
📝 Abstract
If the same neuron activates for both "lender" and "riverside," standard metrics attribute the overlap to superposition--the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due a lexical confound: neurons fire for a shared word form (such as "bank") rather than for two compressed concepts. A 2x2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different word, same meaning) across models spanning 110M-70B parameters. The confound carries into sparse autoencoders (18-36% of features blend senses), sits in <=1% of activation dimensions, and hurts downstream tasks: filtering it out improves word sense disambiguation and makes knowledge edits more selective (p = 0.002).
Problem

Research questions and friction points this paper is trying to address.

polysemy
superposition
lexical confound
word sense disambiguation
neural activation
Innovation

Methods, ideas, or system contributions that make the work stand out.

polysemy
superposition
lexical confound
sparse autoencoders
word sense disambiguation
🔎 Similar Papers
No similar papers found.
I
Iyad Ait Hou
Department of Computer Science, George Washington University, Washington, D.C, USA
Rebecca Hwa
Rebecca Hwa
Computer Science, The George Washington University
computational linguistics