🤖 AI Summary
This study addresses the gap between co-occurrence-based word embeddings and the multimodal sensorimotor experiences underpinning human language comprehension. To bridge this divide, we propose the SENSE model, a neural network architecture that predicts Lancaster sensorimotor norms from word embeddings and is validated through behavioral experiments assessing alignment between model outputs and human judgments. Our work establishes, for the first time, systematic links between word embeddings and eleven sensory modalities—including interoception—and demonstrates significant correlations between model predictions and human selection rates across six of these modalities. Notably, we uncover systematic phonosemantic regularities within the interoceptive dimension, offering a novel pathway to infer phonosemantic patterns solely from textual data.
📝 Abstract
While word embeddings derive meaning from co-occurrence patterns, human language understanding is grounded in sensory and motor experience. We present $\text{SENSE}$ $(\textbf{S}\text{ensorimotor }$ $\textbf{E}\text{mbedding }$ $\textbf{N}\text{orm }$ $\textbf{S}\text{coring }$ $\textbf{E}\text{ngine})$, a learned projection model that predicts Lancaster sensorimotor norms from word lexical embeddings. We also conducted a behavioral study where 281 participants selected which among candidate nonce words evoked specific sensorimotor associations, finding statistically significant correlations between human selection rates and $\text{SENSE}$ ratings across 6 of the 11 modalities. Sublexical analysis of these nonce words selection rates revealed systematic phonosthemic patterns for the interoceptive norm, suggesting a path towards computationally proposing candidate phonosthemes from text data.