🤖 AI Summary
The opaque internal computation mechanisms of Transformers hinder their interpretability and robustness evaluation. To address this, we propose the first Shannon entropy-based framework for dynamic residual flow analysis, modeling inter-layer information entropy evolution as a universal “computational signature.” Our method requires no architectural priors and is applicable across multimodal Transformer architectures—including LLMs and ViTs—under black-box conditions. It achieves three key functionalities: (1) precise discrimination among mainstream model families; (2) automatic classification of task prompt types; and (3) accurate prediction of output accuracy—demonstrating strong correlation (ρ > 0.82) between entropy trajectories and accuracy across multiple benchmarks. Crucially, this work establishes information entropy evolution as a generalizable and transferable computational representation paradigm for Transformers—a novel contribution to foundational model analysis.
📝 Abstract
Transformer models have revolutionized fields from natural language processing to computer vision, yet their internal computational dynamics remain poorly understood raising concerns about predictability and robustness. In this work, we introduce Entropy-Lens, a scalable, model-agnostic framework that leverages information theory to interpret frozen, off-the-shelf large-scale transformers. By quantifying the evolution of Shannon entropy within intermediate residual streams, our approach extracts computational signatures that distinguish model families, categorize task-specific prompts, and correlate with output accuracy. We further demonstrate the generality of our method by extending the analysis to vision transformers. Our results suggest that entropy-based metrics can serve as a principled tool for unveiling the inner workings of modern transformer architectures.