Transformers Can Learn Rules They've Never Seen: Proof of Computation Beyond Interpolation

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work investigates whether Transformers can learn and generalize rules absent from training data, rather than relying solely on interpolation. Through two controlled experiments—masking specific input patterns in an XOR cellular automaton and requiring the model to generate intermediate steps of symbolic reasoning—it provides the first existence proof that Transformers can explicitly learn and extrapolate to unseen rule structures, thereby challenging strong interpolation assumptions. Using a two-layer Transformer architecture augmented with circuit extraction and soft unrolling mechanisms, the study compares against interpolation-based baselines including KNN, MLP, and kernel ridge regression. Results show 78.3% convergence (47/60 trials) with 100% accuracy on the XOR task (96.7% with soft unrolling), and 41.8% average accuracy on the symbolic chain task—substantially outperforming all baselines (max 78.6% vs. ≤4.3%).

Technology Category

Application Category

📝 Abstract

A central question in the LLM debate is whether transformers can infer rules absent from training, or whether apparent generalisation reduces to similarity-based interpolation over observed examples. We test a strong interpolation-only hypothesis in two controlled settings: one where interpolation is ruled out by construction and proof, and one where success requires emitting intermediate symbolic derivations rather than only final answers. In Experiment 1, we use a cellular automaton with a pure XOR transition rule and remove specific local input patterns from training; since XOR is linearly inseparable, each held-out pattern's nearest neighbours have the opposite label, so similarity-based predictors fail on the held-out region. Yet a two-layer transformer recovers the rule (best 100%; 47/60 converged runs), and circuit extraction identifies XOR computation. Performance depends on multi-step constraint propagation: without unrolling, accuracy matches output bias (63.1%), while soft unrolling reaches 96.7%. In Experiment 2, we study symbolic operator chains over integers with one operator pair held out; the model must emit intermediate steps and a final answer in a proof-like format. Across all 49 holdout pairs, the transformer exceeds every interpolation baseline (mean 41.8%, up to 78.6%; mean KRR 4.3%; KNN and MLP score 0% on every pair), while removing intermediate-step supervision degrades performance. Together with a construction showing that a standard transformer block can implement exact local Boolean rules, these results provide an existence proof that transformers can learn rule structure not directly observed in training and express it explicitly, ruling out the strongest architectural form of interpolation-only accounts: that transformers cannot in principle discover and communicate unseen rules, while leaving open when such behaviour arises in large-scale language training.

Problem

Research questions and friction points this paper is trying to address.

transformers

rule learning

generalization

interpolation

out-of-distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

rule learning

transformer architecture

beyond interpolation