🤖 AI Summary
This study investigates whether large language models genuinely grasp the logical semantics of relations—such as symmetry and inverse relationships—in relational tasks, and disentangles whether failure in relation reversal stems from a lack of semantic understanding or from sequential biases inherent in autoregressive generation. To this end, we construct a synthetic data framework grounded in knowledge graphs and train GPT-style autoregressive models from scratch to systematically evaluate their memorization, reasoning, and generalization capabilities. Through sequence-matching tests and comparisons with diffusion model baselines, we demonstrate for the first time that reversal failures are primarily attributable to generation-order bias rather than semantic deficiency. Furthermore, we find that relational semantics can emerge abruptly in shallow models under sufficient logical supervision, and that successful generalization strongly correlates with stable signal representations in intermediate layers.
📝 Abstract
Autoregressive LLMs perform well on relational tasks that require linking entities via relational words (e.g., father/son, friend), but it is unclear whether they learn the logical semantics of such relations (e.g., symmetry and inversion logic) and, if so, whether reversal-type failures arise from missing relational semantics or left-to-right order bias. We propose a controlled Knowledge Graph-based synthetic framework that generates text from symmetric/inverse triples, train GPT-style autoregressive models from scratch, and evaluate memorization, logical inference, and in-context generalization to unseen entities to address these questions. We find a sharp phase transition in which relational semantics emerge with sufficient logic-bearing supervision, even in shallow (2-3 layer) models, and that successful generalization aligns with stable intermediate-layer signals. Finally, order-matched forward/reverse tests and a diffusion baseline indicate that reversal failures are primarily driven by autoregressive order bias rather than deficient inversion semantics.