Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Compact positional encodings in Transformers induce two fundamental phenomena—*isolation*, wherein models fail to jointly learn adjacent simple sequence patterns, and *continuity*, wherein learned sequences generate attractor basins that erroneously pull nearby sequences into incorrect fixed points. Method: We provide the first rigorous mathematical proof that any compact positional encoding necessarily gives rise to both phenomena, and establish their causal link to degraded generalization. Using attractor basin modeling, theoretical analysis, and controlled synthetic sequence experiments, we empirically validate these predictions across multiple Transformer architectures. Contribution/Results: Our findings demonstrate that sequence learning capacity is fundamentally constrained by the topological properties of positional encodings—not by model capacity or training strategy—thereby revealing a foundational theoretical limitation for Transformer architecture design. This work unifies isolation and continuity as inherent consequences of compactness, offering principled guidance for developing more expressive positional representations.

Technology Category

Application Category

📝 Abstract
Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale.
Problem

Research questions and friction points this paper is trying to address.

Understanding Transformers' information processing mechanisms
Identifying isolation and continuity phenomena in Transformers
Proving mathematical limitations in compact positional encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify isolation and continuity in Transformers
Prove phenomena in compact positional encoding
Design experiments to validate theoretical limitations
🔎 Similar Papers
No similar papers found.
H
Hector Pasten
Pontifical Catholic University of Chile, Faculty of Mathematics
Felipe Urrutia
Felipe Urrutia
CENIA
natural language processingexplainability
H
Hector Jimenez
University of Chile & CENIA
C
Cristian B. Calderon
CENIA
C
Cristóbal Rojas
Pontifical Catholic University of Chile & CENIA
Alexander Kozachinskiy
Alexander Kozachinskiy
Postdoc, CENIA Chile
Theoretical Computer Science