TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors

📅 2026-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing analyses of Transformers often focus on individual attention heads or layers, failing to capture the model’s global behavior and lacking a unified representation. This work proposes TensorLens, the first framework that models the entire Transformer as an input-dependent linear operator. By introducing a high-order attention interaction tensor, TensorLens holistically encodes all components—including attention mechanisms, feed-forward networks, activation functions, normalization layers, and residual connections—into a single, coherent structure. The resulting representation is end-to-end, theoretically grounded, and highly expressive. Experimental results demonstrate that TensorLens captures richer features compared to existing attention aggregation methods, offering a powerful new tool for model interpretability and structural understanding.

Technology Category

Application Category

📝 Abstract
Attention matrices are fundamental to transformer research, supporting a broad range of applications including interpretability, visualization, manipulation, and distillation. Yet, most existing analyses focus on individual attention heads or layers, failing to account for the model's global behavior. While prior efforts have extended attention formulations across multiple heads via averaging and matrix multiplications or incorporated components such as normalization and FFNs, a unified and complete representation that encapsulates all transformer blocks is still lacking. We address this gap by introducing TensorLens, a novel formulation that captures the entire transformer as a single, input-dependent linear operator expressed through a high-order attention-interaction tensor. This tensor jointly encodes attention, FFNs, activations, normalizations, and residual connections, offering a theoretically coherent and expressive linear representation of the model's computation. TensorLens is theoretically grounded and our empirical validation shows that it yields richer representations than previous attention-aggregation methods. Our experiments demonstrate that the attention tensor can serve as a powerful foundation for developing tools aimed at interpretability and model understanding. Our code is attached as a supplementary.
Problem

Research questions and friction points this paper is trying to address.

transformer
attention tensor
model interpretability
global representation
high-order interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

TensorLens
high-order attention tensor
transformer analysis
unified linear representation
model interpretability
🔎 Similar Papers
No similar papers found.
I
I. Atad
Blavatnik School of Computer Science and AI, Tel Aviv University
Itamar Zimerman
Itamar Zimerman
PhD Student, The School of Computer Science at Tel Aviv University
Deep LearningMachine LearningArchitectures
S
Shahar Katz
Blavatnik School of Computer Science and AI, Tel Aviv University
Lior Wolf
Lior Wolf
The School of Computer Science at Tel Aviv University