🤖 AI Summary
This work addresses the performance gap between linear attention and standard Softmax attention, which arises because kernel feature mappings in linear attention discard critical semantic information in the negative domain due to non-negativity constraints. To overcome this limitation, the authors propose MirrorLA, a geometric framework that leverages learnable Householder reflections to actively redirect features into the non-negative quadrant, replacing the conventional passive truncation. This approach maximizes information retention while preserving strict linear complexity. By integrating multi-scale design, block-wise isometric transformations, and variance-aware modulation, MirrorLA jointly enhances local discriminability and long-range dependency modeling. Extensive experiments demonstrate that MirrorLA achieves state-of-the-art performance among linear attention methods on major vision benchmarks, confirming its ability to simultaneously attain high efficiency and representation fidelity.
📝 Abstract
Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance. We identify the root cause of this degradation as the non-negativity constraint imposed on kernel feature maps: standard projections like ReLU act as"passive truncation"operators, indiscriminately discarding semantic information residing in the negative domain. We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation. By leveraging learnable Householder reflections, MirrorLA rotates the feature geometry into the non-negative orthant to maximize information retention. Our approach restores representational density through a cohesive, multi-scale design: it first optimizes local discriminability via block-wise isometries, stabilizes long-context dynamics using variance-aware modulation to diversify activations, and finally, integrates dispersed subspaces via cross-head reflections to induce global covariance mixing. MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.