RelFlexformer: Efficient Attention 3D-Transformers for Integrable Relative Positional Encodings

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Existing efficient attention mechanisms struggle to effectively handle unstructured point cloud data arbitrarily distributed in 3D space and lack support for general integrable relative positional encodings. This work proposes RelFlexformer, which introduces the non-uniform fast Fourier transform (NU-FFT) into the attention mechanism for the first time, enabling flexible relative positional encoding through arbitrary integrable modulation functions. This approach unifies and generalizes existing grid-based relative positional encoding methods while supporting heterogeneous token distributions in arbitrary 3D space. The model achieves efficient computation with O(L log L) time complexity. Experimental results demonstrate that NU-FFT-driven attention modulation significantly enhances performance on 3D tasks while maintaining low computational overhead.
📝 Abstract
We present a new class of efficient attention mechanisms applying universal 3D Relative Positional Encoding (RPE) methods given by arbitrary integrable modulation functions $f$. They lead to the new class of 3D-Transformer models, called \textit{RelFlexformers}, flexibly integrating those RPEs, and characterized by the $O(L \log L)$ time complexity of the attention computation for the $L$-length input sequences. RelFlexformers builds on the theory of the Non-Uniform Fourier Transform (NU-FFT), naturally generalizing several existing efficient RPE-attention methods from structured settings with tokens homogeneously embedded in unweighted grids into general non-structured heterogeneous scenarios, where tokens' positions are arbitrarily distributed in the corresponding 3D spaces. As such, RelFlexformers can be applied in particular to model point clouds. Our extensive empirical evaluation on a large portfolio of 3D datasets confirms quality improvements provided by the NU-FFT-driven attention modulation techniques in the RelFlexformers.
Problem

Research questions and friction points this paper is trying to address.

3D Transformers
Relative Positional Encoding
Point Clouds
Efficient Attention
Non-Uniform Fourier Transform
Innovation

Methods, ideas, or system contributions that make the work stand out.

RelFlexformer
3D Relative Positional Encoding
Non-Uniform FFT
Efficient Attention
Point Cloud Modeling