Attention on the Sphere

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously achieving rotational equivariance and geometric fidelity in modeling spherical data (e.g., atmospheric dynamics, cosmic microwave background, robotic vision), this paper introduces the first equivariant Transformer architecture designed specifically for the spherical domain. Methodologically, it proposes: (1) a weighted attention mechanism grounded in spherical numerical integration, yielding approximate SO(3)-equivariance; (2) geodesic neighborhood attention, which incorporates local geometric priors to enhance generalization and scalability; and (3) custom CUDA-optimized kernels with memory-efficient implementation. Evaluated on spherical shallow-water equation simulation, spherical image segmentation, and spherical depth estimation, the model substantially outperforms planar Transformer baselines. Results demonstrate that explicit geometric priors—particularly rotational equivariance and geodesic locality—are critical for improving learning performance on spherical manifolds.

Technology Category

Application Category

📝 Abstract
We introduce a generalized attention mechanism for spherical domains, enabling Transformer architectures to natively process data defined on the two-dimensional sphere - a critical need in fields such as atmospheric physics, cosmology, and robotics, where preserving spherical symmetries and topology is essential for physical accuracy. By integrating numerical quadrature weights into the attention mechanism, we obtain a geometrically faithful spherical attention that is approximately rotationally equivariant, providing strong inductive biases and leading to better performance than Cartesian approaches. To further enhance both scalability and model performance, we propose neighborhood attention on the sphere, which confines interactions to geodesic neighborhoods. This approach reduces computational complexity and introduces the additional inductive bias for locality, while retaining the symmetry properties of our method. We provide optimized CUDA kernels and memory-efficient implementations to ensure practical applicability. The method is validated on three diverse tasks: simulating shallow water equations on the rotating sphere, spherical image segmentation, and spherical depth estimation. Across all tasks, our spherical Transformers consistently outperform their planar counterparts, highlighting the advantage of geometric priors for learning on spherical domains.
Problem

Research questions and friction points this paper is trying to address.

Develop spherical attention for Transformer architectures on 2D sphere data
Achieve rotational equivariance and preserve spherical symmetries in attention
Enhance scalability and performance via geodesic neighborhood attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized attention mechanism for spherical domains
Neighborhood attention confined to geodesic neighborhoods
Optimized CUDA kernels for efficient implementation
🔎 Similar Papers
No similar papers found.
B
B. Bonev
NVIDIA Corporation, 95051 Santa Clara, CA, USA
Max Rietmann
Max Rietmann
NVIDIA
PDEsDeep LearningNumerical Methods
Andrea Paris
Andrea Paris
ETH Zürich and Massachusetts Institute of Technology
Fluid MechanicsHigh Performance ComputingImmersed MethodsCloud Dynamics
A
A. Carpentieri
NVIDIA Corporation, 95051 Santa Clara, CA, USA
T
Thorsten Kurth
NVIDIA Corporation, 95051 Santa Clara, CA, USA