🤖 AI Summary
Large language models struggle with analogical reasoning when surface cues conflict with underlying structural relationships, revealing limitations in their capacity for abstract generalization. This study systematically evaluates open-source large language models on narrative and rhetorical analogy tasks using both representational probing and standard prompting methods. The findings show that both approaches perform poorly and comparably on narrative analogies, yet probing significantly outperforms prompting on rhetorical analogies. This task-dependent asymmetry between internal representations and overt behavior underscores the unique advantage of probing techniques in uncovering latent knowledge within models. The results offer a novel perspective on the mechanisms underlying analogical reasoning in large language models and highlight the importance of representation-level analysis in assessing their cognitive capabilities.
📝 Abstract
Analogical reasoning is a core cognitive faculty essential for narrative understanding. While LLMs perform well when surface and structural cues align, they struggle in cases where an analogy is not apparent on the surface but requires latent information, suggesting limitations in abstraction and generalisation. In this paper we compare a model's probed representations with its prompted performance at detecting narrative analogies, revealing an asymmetry: for rhetorical analogies, probing significantly outperforms prompting in open-source models, while for narrative analogies, they achieve a similar (low) performance. This suggests that the relationship between internal representations and prompted behavior is task-dependent and may reflect limitations in how prompting accesses available information.