Weierstrass Positional Encoding for Vision Transformers

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the limitation of conventional one-dimensional learnable positional encodings in vision transformers, which disregard the intrinsic two-dimensional spatial structure of images and lack geometric constraints, thereby failing to preserve consistency between spatial distances and sequence indices. To overcome this, the authors propose Weierstrass Elliptic Positional Encoding (WePE), the first approach to incorporate Weierstrass elliptic functions into visual positional encoding. By mapping normalized image patch coordinates onto the complex plane, WePE leverages the function and its derivative to construct a four-dimensional positional representation that effectively retains 2D geometric structure and models spatial distance relationships. Owing to its double periodicity and lattice structure, WePE enables direct derivation of relative positions between arbitrary patch pairs via algebraic addition formulas. Implemented with precomputed lookup tables and a plug-and-play design, WePE consistently improves performance across various vision tasks with negligible computational or memory overhead.

📝 Abstract

Vision Transformers have achieved remarkable success in computer vision, but their common use of learnable one-dimensional positional encodings weakens the inherent two-dimensional spatial structure of images after patch flattening. Existing positional encodings often lack geometric constraints and do not preserve a monotonic relationship between Euclidean spatial distances and sequential index distances, limiting ViTs' ability to exploit spatial proximity priors. Motivated by the usefulness of periodicity in positional encoding, we propose Weierstrass elliptic Positional Encoding (WePE), a mathematically grounded method for encoding two-dimensional coordinates in the complex domain. WePE maps normalized 2D patch coordinates onto the complex plane and constructs compact four-dimensional positional features using the Weierstrass elliptic function and its derivative. The double periodicity provides a principled representation of 2D positions, and its intrinsic lattice structure naturally matches the regular geometry of image patch grids. Its nonlinear geometric properties help model spatial distance relationships more faithfully, while the algebraic addition formula enables relative positional information between arbitrary patch pairs to be derived directly from their absolute encodings. WePE is plug-and-play and resolution-agnostic, allowing seamless integration into existing ViTs. Extensive experiments show that WePE brings consistent performance gains in most settings. With precomputed lookup tables, these improvements introduce no noticeable computational or memory overhead. Additional analyses and ablation studies further validate the effectiveness of the proposed method.

Problem

Research questions and friction points this paper is trying to address.

Vision Transformers

positional encoding

spatial structure

geometric constraints

2D coordinates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weierstrass elliptic function

positional encoding

Vision Transformers