🤖 AI Summary
This work investigates whether large language models (LLMs) possess the cross-domain semantic structural mapping capability essential for analogical reasoning—i.e., recognizing and transferring abstract relational patterns, rather than merely reproducing surface-level associations observed during training.
Method: We introduce the first multi-dimensional analogy benchmark featuring semantically rich symbols (non-abstract tokens), enabling cross-modal transfer from natural language word tokens to non-linguistic domains. The benchmark integrates human behavioral experiments with zero-shot and few-shot evaluations of state-of-the-art LLMs (e.g., GPT, Claude).
Contribution/Results: LLMs achieve human-level performance on most structural mapping tasks but exhibit significant degradation under semantic interference and high relational abstraction—revealing a fundamental gap in their emulation of core human analogical cognition. This study provides the first systematic characterization of LLMs’ robustness boundaries for analogical reasoning in realistic semantic contexts.
📝 Abstract
Analogical reasoning is considered core to human learning and cognition. Recent studies have compared the analogical reasoning abilities of human subjects and Large Language Models (LLMs) on abstract symbol manipulation tasks, such as letter string analogies. However, these studies largely neglect analogical reasoning over semantically meaningful symbols, such as natural language words. This ability to draw analogies that link language to non-linguistic domains, which we term semantic structure-mapping, is thought to play a crucial role in language acquisition and broader cognitive development. We test human subjects and LLMs on analogical reasoning tasks that require the transfer of semantic structure and content from one domain to another. Advanced LLMs match human performance across many task variations. However, humans and LLMs respond differently to certain task variations and semantic distractors. Overall, our data suggest that LLMs are approaching human-level performance on these important cognitive tasks, but are not yet entirely human like.