Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the geometric paradox between high-dimensional token embeddings (~10³) in large language models (LLMs) and the low-dimensional semantic nature of human language (~10¹). We systematically characterize the dimensional evolution of token representations across Transformer layers: embeddings initially diffuse, then undergo progressive projection, ultimately converging onto a ~10-dimensional semantic submanifold. We introduce “working-space dimension” — a novel intrinsic dimensionality metric — and empirically demonstrate, for the first time, that Transformers implicitly perform inter-layer dimensionality compression; crucially, this compressed dimension exhibits a significant negative correlation with model performance. Our methodology integrates intrinsic dimension estimation (MLE/TwoNN), layer-wise representational correlation analysis, cross-architecture geometric trajectory tracking, and manifold visualization. We validate the universal expansion–contraction pattern across LLaMA, Qwen, and Phi families, and develop a fine-tuning-free diagnostic tool to precisely identify over-parameterization and semantic compression failure.

Technology Category

Application Category

📝 Abstract
The geometric evolution of token representations in large language models (LLMs) presents a fundamental paradox: while human language inherently organizes semantic information in low-dimensional spaces ($sim 10^1$ dimensions), modern LLMs employ high-dimensional embeddings ($sim 10^3$ dimensions) processed through Transformer architectures. To resolve this paradox, this work bridges this conceptual gap by developing a geometric framework that tracks token dynamics across Transformers layers. Through layer-wise analysis of intrinsic dimensions across multiple architectures, we reveal an expansion-contraction pattern where tokens diffuse to a"working space"and then progressively project onto lower-dimensional submanifolds. Our finding implies a negative correlation between the working space dimension and parameter-sensitive performance of the LLMs, and indicates that effective models tend to compress tokens into approximately 10-dimensional submanifolds, closely resembling human semantic spaces. This work not only advances LLM interpretability by reframing Transformers layers as projectors that mediate between high-dimensional computation and low-dimensional semantics, but also provides practical tools for model diagnostics that do not rely on task-specific evaluations.
Problem

Research questions and friction points this paper is trying to address.

Resolve paradox between high-dimensional LLM embeddings and low-dimensional human language semantics
Analyze token dynamics and intrinsic dimensions across Transformer layers
Develop geometric framework to link model performance with dimensional reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise geometric framework tracks token dynamics
Reveals expansion-contraction pattern in token dimensions
Links low-dimensional submanifolds to model performance
🔎 Similar Papers
No similar papers found.
Zhuo-Yang Song
Zhuo-Yang Song
Undergraduated Student of Physcis, Peking University
hep-phCs-CL
Z
Zeyu Li
CAS Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China
Qing-Hong Cao
Qing-Hong Cao
Peking University
high energy physics
M
Ming-xing Luo
Beijing Computational Science Research Center, Beijing 100193, China
Hua Xing Zhu
Hua Xing Zhu
Peking University
Quantum Field TheoryQCDEffective Field Theory