Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Compact positional encodings in Transformers induce two fundamental phenomena—*isolation*, wherein models fail to jointly learn adjacent simple sequence patterns, and *continuity*, wherein learned sequences generate attractor basins that erroneously pull nearby sequences into incorrect fixed points. Method: We provide the first rigorous mathematical proof that any compact positional encoding necessarily gives rise to both phenomena, and establish their causal link to degraded generalization. Using attractor basin modeling, theoretical analysis, and controlled synthetic sequence experiments, we empirically validate these predictions across multiple Transformer architectures. Contribution/Results: Our findings demonstrate that sequence learning capacity is fundamentally constrained by the topological properties of positional encodings—not by model capacity or training strategy—thereby revealing a foundational theoretical limitation for Transformer architecture design. This work unifies isolation and continuity as inherent consequences of compactness, offering principled guidance for developing more expressive positional representations.

Technology Category

Application Category

📝 Abstract

Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale.

Problem

Research questions and friction points this paper is trying to address.

Understanding Transformers' information processing mechanisms

Identifying isolation and continuity phenomena in Transformers

Proving mathematical limitations in compact positional encoding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify isolation and continuity in Transformers

Prove phenomena in compact positional encoding

Design experiments to validate theoretical limitations

🔎 Similar Papers

No similar papers found.