PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the quadratic computational complexity of self-attention in processing long sequences by introducing Polynomial Mixer (PoM), a novel token-mixing mechanism that achieves linear complexity. PoM employs learnable polynomial functions to aggregate input tokens into compact representations while recovering contextual information. It is the first method to replace self-attention with a linear-complexity alternative without sacrificing general sequence modeling capability, and it offers theoretically grounded guarantees on contextual mapping properties. Experimental results demonstrate that PoM matches the performance of self-attention across five diverse tasks—text generation, handwriting recognition, image generation, 3D modeling, and remote sensing—while substantially reducing computational overhead for long sequences.
📝 Abstract
This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.
Problem

Research questions and friction points this paper is trying to address.

self-attention
linear complexity
token mixing
long sequences
computational cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Polynomial Mixer
linear complexity
self-attention replacement
contextual mapping
token mixing
🔎 Similar Papers
No similar papers found.
David Picard
David Picard
LIGM, Ecole des Ponts ParisTech (ENPC)
Machine LearningKernel MethodsDeep LearningComputer VisionImage Processing
Nicolas Dufour
Nicolas Dufour
Ecole des Ponts & Ecole Polytechnique, IP Paris
Computer VisionMachine LearningDeep Learning
Lucas Degeorge
Lucas Degeorge
Ecole polytechnique
Computer visiondiffusion modelsconditional generative models
Arijit Ghosh
Arijit Ghosh
PhD Student at IP Paris
Computer VisionDeep LearningMachine LearningImage Processing
Davide Allegro
Davide Allegro
Università degli Studi di Padova
Computer visionRoboticsCamera Calibration3D reconstruction
T
Tom Ravaud
LIGM, CNRS, Univ Gustave Eiffel, ENPC, Institut Polytechnique de Paris, France
Y
Yohann Perron
LIGM, CNRS, Univ Gustave Eiffel, ENPC, Institut Polytechnique de Paris, France; EFEO
Corentin Sautier
Corentin Sautier
ENPC / valeo.ai
Zeynep Sonat Baltaci
Zeynep Sonat Baltaci
Ecole des Ponts ParisTech
Computer VisionDigital Humanities
Fei Meng
Fei Meng
Hong Kong University of Science and Technology
RoboticsMotion PlanningControl
S
Syrine Kalleli
LIGM, CNRS, Univ Gustave Eiffel, ENPC, Institut Polytechnique de Paris, France
M
Marta López-Rauhut
LIGM, CNRS, Univ Gustave Eiffel, ENPC, Institut Polytechnique de Paris, France
Thibaut Loiseau
Thibaut Loiseau
LIGM - IMAGINE - Ecole des Ponts
Computer VisionDeep LearningComputer Science
S
Ségolène Albouy
LIGM, CNRS, Univ Gustave Eiffel, ENPC, Institut Polytechnique de Paris, France
R
Raphael Baena
LIGM, CNRS, Univ Gustave Eiffel, ENPC, Institut Polytechnique de Paris, France
Elliot Vincent
Elliot Vincent
Researcher - LASTIG lab. (Univ Gustave Eiffel, IGN, ENSG)
remote sensingcomputer visionmachine learningdeep learning
Loic Landrieu
Loic Landrieu
senior researcher, ENPC
machine learningremote sensingoptimizationcomputer vision