Steerable Transformers

📅 2024-05-24
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of strict equivariance under the special Euclidean group SE(d) in standard Vision Transformers. Methodologically: (1) steerable convolutions are employed to extract SE(d)-equivariant features; (2) a nonlinear attention mechanism is formulated in the Fourier domain—bypassing spatial interpolation artifacts—to ensure exact translational and rotational equivariance; (3) frequency-domain nonlinear activations and SE(d)-equivariant feature encoding are introduced. Evaluated on 2D/3D geometric perception benchmarks, the model substantially outperforms purely steerable CNNs, demonstrating the efficacy of equivariant attention for robust geometric modeling. The core contribution is the first integration of strict SE(d) equivariance into a Transformer backbone, establishing a novel paradigm of Fourier-domain equivariant attention.

Technology Category

Application Category

📝 Abstract
In this work we introduce Steerable Transformers, an extension of the Vision Transformer mechanism that maintains equivariance to the special Euclidean group $mathrm{SE}(d)$. We propose an equivariant attention mechanism that operates on features extracted by steerable convolutions. Operating in Fourier space, our network utilizes Fourier space non-linearities. Our experiments in both two and three dimensions show that adding steerable transformer layers to steerable convolutional networks enhances performance.
Problem

Research questions and friction points this paper is trying to address.

Extend Vision Transformer for SE(d) equivariance
Develop equivariant attention with steerable convolutions
Enhance performance in 2D and 3D tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Steerable Transformers
Equivariant attention mechanism
Fourier space non-linearities
🔎 Similar Papers
No similar papers found.