From Associations to Activations: Comparing Behavioral and Hidden-State Semantic Geometry in LLMs

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This study investigates whether the behavioral outputs of large language models (LLMs) reflect the semantic geometry of their internal hidden states. Using psycholinguistic experiments—specifically forced-choice and free-association tasks—the authors collect behavioral data from eight instruction-tuned Transformer models across 5,000 words, constructing behavioral similarity matrices. These are then compared with hidden-state representations at each layer via representational similarity analysis. The work presents the first systematic comparison of the two behavioral paradigms in recovering LLMs’ internal semantic geometry, revealing that forced-choice significantly outperforms free association. Notably, behavioral similarities derived from forced-choice effectively predict hidden-state similarities for unseen words, surpassing lexical baselines such as FastText and BERT, as well as cross-model consensus, thereby demonstrating a strong alignment between behavioral data and internal representations.

Technology Category

Application Category

📝 Abstract

We investigate the extent to which an LLM's hidden-state geometry can be recovered from its behavior in psycholinguistic experiments. Across eight instruction-tuned transformer models, we run two experimental paradigms -- similarity-based forced choice and free association -- over a shared 5,000-word vocabulary, collecting 17.5M+ trials to build behavior-based similarity matrices. Using representational similarity analysis, we compare behavioral geometries to layerwise hidden-state similarity and benchmark against FastText, BERT, and cross-model consensus. We find that forced-choice behavior aligns substantially more with hidden-state geometry than free association. In a held-out-words regression, behavioral similarity (especially forced choice) predicts unseen hidden-state similarities beyond lexical baselines and cross-model consensus, indicating that behavior-only measurements retain recoverable information about internal semantic geometry. Finally, we discuss implications for the ability of behavioral tasks to uncover hidden cognitive states.

Problem

Research questions and friction points this paper is trying to address.

semantic geometry

large language models

behavioral experiments

hidden states

representational similarity

Innovation

Methods, ideas, or system contributions that make the work stand out.

representational similarity analysis

behavioral semantics

hidden-state geometry