Linear-Time Global Visual Modeling without Explicit Attention

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work proposes a novel perspective on sequence modeling by reinterpreting the attention mechanism as a dynamic parameter prediction process within a multilayer perceptron (MLP). Traditional Transformers rely on explicit attention for global context modeling, yet their quadratic computational complexity hinders scalability. The proposed approach entirely eliminates explicit attention, instead implicitly compressing global contextual information through dynamically generated MLP parameters, thereby achieving linear computational complexity. This is the first method to fully replace the attention mechanism with a dynamic parameterization strategy while preserving strong global modeling capabilities. Empirical results demonstrate that the model significantly reduces computational overhead in vision tasks without sacrificing performance, matching or closely approaching that of standard Transformers.

📝 Abstract

Existing research largely attributes the global sequence modeling capability of Transformers to the explicit computation of attention weights, a process that inherently incurs quadratic computational complexity. In this work, we offer a novel perspective: we demonstrate that attention can be mathematically reframed as a Multi-Layer Perceptron (MLP) equipped with dynamically predicted parameters. Through this lens, we explain attention's global modeling power not as explicit token-wise aggregation, but as an implicit process where dynamically generated parameters act as a compressed representation of the global context. Inspired by this insight, we investigate a fundamental question: can we achieve Transformer-level sequence global modeling entirely through dynamic parameterization while maintaining linear complexity, effectively replacing explicit attention? To explore this, we design various dynamic parameter prediction strategies and integrate them into standard network layers. Extensive empirical studies on vision models demonstrate that dynamic parameterization can indeed serve as a highly effective, linear-complexity alternative to explicit attention, opening new pathways for efficient sequence modeling. Code is available at https://github.com/LeapLabTHU/WeightFormer.

Problem

Research questions and friction points this paper is trying to address.

global sequence modeling

explicit attention

linear complexity

dynamic parameterization

vision models

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic parameterization

linear complexity

global sequence modeling