The Zero Body Problem: Probing LLM Use of Sensory Language

📅 2025-04-08

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study investigates large language models’ (LLMs) capacity to model human sensory language—such as taste, audition, and pain—despite lacking embodied experience. Using an 18K human-model parallel short-story dataset, we systematically evaluate 18 mainstream LLMs. Results reveal significant distributional divergence from human sensory word usage: Gemini-family models consistently overgenerate sensory terms, while others systematically underutilize them; instruction tuning further suppresses sensory expression—challenging the intuition that stronger models are more human-like. Our methodology integrates cross-model batched generation, statistical significance testing, linear probing across five models, and controlled ablation experiments. We demonstrate that LLMs possess latent sensory-word perception capability but fail to adequately activate it during generation. To advance embodied language research, we publicly release the 18K dataset and introduce a novel analytical framework serving as a benchmark resource and methodological foundation.

Technology Category

Application Category

📝 Abstract

Sensory language expresses embodied experiences ranging from taste and sound to excitement and stomachache. This language is of interest to scholars from a wide range of domains including robotics, narratology, linguistics, and cognitive science. In this work, we explore whether language models, which are not embodied, can approximate human use of embodied language. We extend an existing corpus of parallel human and model responses to short story prompts with an additional 18,000 stories generated by 18 popular models. We find that all models generate stories that differ significantly from human usage of sensory language, but the direction of these differences varies considerably between model families. Namely, Gemini models use significantly more sensory language than humans along most axes whereas most models from the remaining five families use significantly less. Linear probes run on five models suggest that they are capable of identifying sensory language. However, we find preliminary evidence suggesting that instruction tuning may discourage usage of sensory language. Finally, to support further work, we release our expanded story dataset.

Problem

Research questions and friction points this paper is trying to address.

Explores if non-embodied LLMs approximate human sensory language use

Analyzes differences in sensory language between models and humans

Investigates impact of instruction tuning on sensory language usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extend corpus with 18,000 model-generated stories

Compare sensory language usage across models

Investigate instruction tuning impact on sensory language

🔎 Similar Papers

Hallucination of Multimodal Large Language Models: A Survey