CAST: Compositional Analysis via Spectral Tracking for Understanding Transformer Layer Functions

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

The internal mechanisms of large language models lack systematic interpretability analysis. Method: This paper introduces CAST, the first framework to directly estimate the actual transformation matrices of individual Transformer layers via the Moore–Penrose pseudoinverse, enabling probe-free, fine-grained full-spectrum analysis based on six spectral metrics. Integrating CKA similarity matrices with kernel analysis techniques, CAST characterizes fundamental architectural differences between encoders and decoders. Contribution/Results: We reveal that decoders exhibit a “compression–expansion” cyclic dynamic, whereas encoders preserve high-rank feature representations. Transformer layers are functionally partitioned into three distinct phases: feature extraction, compression, and specialization. CAST establishes a verifiable, generalizable paradigm for analyzing inter-layer information flow and functional division of labor in Transformers, advancing mechanistic interpretability beyond heuristic probing.

Technology Category

Application Category

📝 Abstract

Large language models have achieved remarkable success but remain largely black boxes with poorly understood internal mechanisms. To address this limitation, many researchers have proposed various interpretability methods including mechanistic analysis, probing classifiers, and activation visualization, each providing valuable insights from different perspectives. Building upon this rich landscape of complementary approaches, we introduce CAST (Compositional Analysis via Spectral Tracking), a probe-free framework that contributes a novel perspective by analyzing transformer layer functions through direct transformation matrix estimation and comprehensive spectral analysis. CAST offers complementary insights to existing methods by estimating the realized transformation matrices for each layer using Moore-Penrose pseudoinverse and applying spectral analysis with six interpretable metrics characterizing layer behavior. Our analysis reveals distinct behaviors between encoder-only and decoder-only models, with decoder models exhibiting compression-expansion cycles while encoder models maintain consistent high-rank processing. Kernel analysis further demonstrates functional relationship patterns between layers, with CKA similarity matrices clearly partitioning layers into three phases: feature extraction, compression, and specialization.

Problem

Research questions and friction points this paper is trying to address.

Analyzing transformer layer functions through spectral tracking and matrix estimation

Understanding internal mechanisms of black-box large language models

Revealing distinct layer behaviors in encoder and decoder architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probe-free framework analyzes transformer layer functions

Estimates transformation matrices using Moore-Penrose pseudoinverse

Applies spectral analysis with six interpretable metrics

🔎 Similar Papers

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models