Order-Level Attention Similarity Across Language Models: A Latent Commonality

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Prior studies on contextual aggregation mechanisms in language models (LMs) are largely confined to single-model or single-head analyses, limiting insights into cross-model commonalities. Method: We propose Ordinal-Level Attention (OLA), an attention-unfolding–based ordinal decomposition technique, to systematically characterize contextual aggregation across diverse LMs. Additionally, we design the Training-Free Ordinal Adapter (TOA), a zero-parameter, training-free adapter that leverages OLA as a unified syntactic feature representation for cross-model knowledge transfer. Contribution/Results: We discover that OLA exhibits highly consistent patterns across multiple LMs—including LLaMA, Qwen, and Phi—and implicitly encodes syntactic structure information. TOA, built upon this invariant representation, significantly improves downstream task performance on unseen models without any parameter updates or fine-tuning. Experimental results validate both the strong generalizability of OLA and the effectiveness of TOA for cross-model adaptation.

Technology Category

Application Category

📝 Abstract

In this paper, we explore an important yet previously neglected question: Do context aggregation patterns across Language Models (LMs) share commonalities? While some works have investigated context aggregation or attention weights in LMs, they typically focus on individual models or attention heads, lacking a systematic analysis across multiple LMs to explore their commonalities. In contrast, we focus on the commonalities among LMs, which can deepen our understanding of LMs and even facilitate cross-model knowledge transfer. In this work, we introduce the Order-Level Attention (OLA) derived from the order-wise decomposition of Attention Rollout and reveal that the OLA at the same order across LMs exhibits significant similarities. Furthermore, we discover an implicit mapping between OLA and syntactic knowledge. Based on these two findings, we propose the Transferable OLA Adapter (TOA), a training-free cross-LM adapter transfer method. Specifically, we treat the OLA as a unified syntactic feature representation and train an adapter that takes OLA as input. Due to the similarities in OLA across LMs, the adapter generalizes to unseen LMs without requiring any parameter updates. Extensive experiments demonstrate that TOA's cross-LM generalization effectively enhances the performance of unseen LMs. Code is available at https://github.com/jinglin-liang/OLAS.

Problem

Research questions and friction points this paper is trying to address.

Investigating common attention patterns across different language models

Exploring implicit mapping between attention mechanisms and syntactic knowledge

Developing transferable adapters for cross-model generalization without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Order-Level Attention reveals cross-model similarities

Implicit mapping connects OLA with syntactic knowledge

Training-free adapter transfers across models using OLA

🔎 Similar Papers

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models