Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

In cross-domain sequential recommendation (CDSR), cross-attention is conventionally interpreted as “residual alignment”—filtering redundant information from queries using keys/values from another domain. This work first uncovers an alternative, intrinsic mechanism: “orthogonal alignment,” wherein cross-attention spontaneously generates novel semantic representations absent in the original query inputs—without explicit constraints. Building on this insight, the authors formally model and empirically validate the coexistence and synergy of residual and orthogonal alignment. Extensive experiments across 300+ configurations demonstrate that orthogonal alignment consistently improves recommendation accuracy and cross-domain generalization. The proposed method requires no additional regularization or parameters; at comparable model size, it outperforms baselines in accuracy and achieves superior accuracy–parameter efficiency. This finding provides a new theoretical lens and practical pathway for efficient scaling of multimodal recommendation models.

Technology Category

Application Category

📝 Abstract

Cross-domain sequential recommendation (CDSR) aims to align heterogeneous user behavior sequences collected from different domains. While cross-attention is widely used to enhance alignment and improve recommendation performance, its underlying mechanism is not fully understood. Most researchers interpret cross-attention as residual alignment, where the output is generated by removing redundant and preserving non-redundant information from the query input by referencing another domain data which is input key and value. Beyond the prevailing view, we introduce Orthogonal Alignment, a phenomenon in which cross-attention discovers novel information that is not present in the query input, and further argue that those two contrasting alignment mechanisms can co-exist in recommendation models We find that when the query input and output of cross-attention are orthogonal, model performance improves over 300 experiments. Notably, Orthogonal Alignment emerges naturally, without any explicit orthogonality constraints. Our key insight is that Orthogonal Alignment emerges naturally because it improves scaling law. We show that baselines additionally incorporating cross-attention module outperform parameter-matched baselines, achieving a superior accuracy-per-model parameter. We hope these findings offer new directions for parameter-efficient scaling in multi-modal research.

Problem

Research questions and friction points this paper is trying to address.

Cross-attention mechanisms in recommendation models lack full understanding

Orthogonal alignment phenomenon emerges without explicit constraints in cross-attention

Cross-attention improves scaling law and parameter efficiency in recommendations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-attention enables orthogonal alignment in models

Orthogonal alignment emerges naturally without explicit constraints

Method improves scaling law and parameter efficiency

🔎 Similar Papers

Long-Sequence Recommendation Models Need Decoupled Embeddings