Memorization, Emergence, and Explaining Reversal Failures: A Controlled Study of Relational Semantics in LLMs

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This study investigates whether large language models genuinely grasp the logical semantics of relations—such as symmetry and inverse relationships—in relational tasks, and disentangles whether failure in relation reversal stems from a lack of semantic understanding or from sequential biases inherent in autoregressive generation. To this end, we construct a synthetic data framework grounded in knowledge graphs and train GPT-style autoregressive models from scratch to systematically evaluate their memorization, reasoning, and generalization capabilities. Through sequence-matching tests and comparisons with diffusion model baselines, we demonstrate for the first time that reversal failures are primarily attributable to generation-order bias rather than semantic deficiency. Furthermore, we find that relational semantics can emerge abruptly in shallow models under sufficient logical supervision, and that successful generalization strongly correlates with stable signal representations in intermediate layers.

Technology Category

Application Category

📝 Abstract

Autoregressive LLMs perform well on relational tasks that require linking entities via relational words (e.g., father/son, friend), but it is unclear whether they learn the logical semantics of such relations (e.g., symmetry and inversion logic) and, if so, whether reversal-type failures arise from missing relational semantics or left-to-right order bias. We propose a controlled Knowledge Graph-based synthetic framework that generates text from symmetric/inverse triples, train GPT-style autoregressive models from scratch, and evaluate memorization, logical inference, and in-context generalization to unseen entities to address these questions. We find a sharp phase transition in which relational semantics emerge with sufficient logic-bearing supervision, even in shallow (2-3 layer) models, and that successful generalization aligns with stable intermediate-layer signals. Finally, order-matched forward/reverse tests and a diffusion baseline indicate that reversal failures are primarily driven by autoregressive order bias rather than deficient inversion semantics.

Problem

Research questions and friction points this paper is trying to address.

relational semantics

reversal failures

autoregressive bias

logical inference

symmetry

Innovation

Methods, ideas, or system contributions that make the work stand out.

relational semantics

autoregressive bias

emergence