Layerwise Dynamics for In-Context Classification in Transformers

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

171K/year
🤖 AI Summary
This work addresses the opacity of Transformer inference in separator-free multiclass linear classification by imposing permutation equivariance constraints between features and labels. This constraint enforces a highly structured weight configuration while preserving functional equivalence to the original model. Leveraging this design, the study presents the first explicit inter-layer recursive update rule extracted from an end-to-end trained Softmax-based Transformer, thereby uncovering the implicit geometric algorithmic nature of the attention mechanism. The proposed approach not only enhances class separability but also theoretically guarantees desired class-pair robustness, offering both interpretability and performance benefits in linearly separable classification settings.

Technology Category

Application Category

📝 Abstract
Transformers can perform in-context classification from a few labeled examples, yet the inference-time algorithm remains opaque. We study multi-class linear classification in the hard no-margin regime and make the computation identifiable by enforcing feature- and label-permutation equivariance at every layer. This enables interpretability while maintaining functional equivalence and yields highly structured weights. From these models we extract an explicit depth-indexed recursion: an end-to-end identified, emergent update rule inside a softmax transformer, to our knowledge the first of its kind. Attention matrices formed from mixed feature-label Gram structure drive coupled updates of training points, labels, and the test probe. The resulting dynamics implement a geometry-driven algorithmic motif, which can provably amplify class separation and yields robust expected class alignment.
Problem

Research questions and friction points this paper is trying to address.

in-context learning
transformer interpretability
layerwise dynamics
few-shot classification
linear classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning
permutation equivariance
layerwise dynamics
emergent algorithm
transformer interpretability
🔎 Similar Papers
No similar papers found.