Steerable Transformers

📅 2024-05-24

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the lack of strict equivariance under the special Euclidean group SE(d) in standard Vision Transformers. Methodologically: (1) steerable convolutions are employed to extract SE(d)-equivariant features; (2) a nonlinear attention mechanism is formulated in the Fourier domain—bypassing spatial interpolation artifacts—to ensure exact translational and rotational equivariance; (3) frequency-domain nonlinear activations and SE(d)-equivariant feature encoding are introduced. Evaluated on 2D/3D geometric perception benchmarks, the model substantially outperforms purely steerable CNNs, demonstrating the efficacy of equivariant attention for robust geometric modeling. The core contribution is the first integration of strict SE(d) equivariance into a Transformer backbone, establishing a novel paradigm of Fourier-domain equivariant attention.

Technology Category

Application Category

📝 Abstract

In this work we introduce Steerable Transformers, an extension of the Vision Transformer mechanism that maintains equivariance to the special Euclidean group $mathrm{SE}(d)$. We propose an equivariant attention mechanism that operates on features extracted by steerable convolutions. Operating in Fourier space, our network utilizes Fourier space non-linearities. Our experiments in both two and three dimensions show that adding steerable transformer layers to steerable convolutional networks enhances performance.

Problem

Research questions and friction points this paper is trying to address.

Extend Vision Transformer for SE(d) equivariance

Develop equivariant attention with steerable convolutions

Enhance performance in 2D and 3D tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Steerable Transformers

Equivariant attention mechanism

Fourier space non-linearities

🔎 Similar Papers

No similar papers found.