PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the quadratic computational complexity of self-attention in processing long sequences by introducing Polynomial Mixer (PoM), a novel token-mixing mechanism that achieves linear complexity. PoM employs learnable polynomial functions to aggregate input tokens into compact representations while recovering contextual information. It is the first method to replace self-attention with a linear-complexity alternative without sacrificing general sequence modeling capability, and it offers theoretically grounded guarantees on contextual mapping properties. Experimental results demonstrate that PoM matches the performance of self-attention across five diverse tasks—text generation, handwriting recognition, image generation, 3D modeling, and remote sensing—while substantially reducing computational overhead for long sequences.

Technology Category

Application Category

📝 Abstract

This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.

Problem

Research questions and friction points this paper is trying to address.

self-attention

linear complexity

token mixing

long sequences

computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Polynomial Mixer

linear complexity

self-attention replacement