🤖 AI Summary
The quadratic complexity $O(N^2)$ of standard self-attention severely limits its scalability to large-scale unstructured meshes. To address this, we propose FLARE—a linear-complexity ($O(NM)$, where $M ll N$) self-attention mechanism that maps variable-length mesh inputs into a fixed-length latent sequence via learnable query tokens, and employs low-rank attention routing alongside multi-head parallelism for efficient global modeling. FLARE jointly achieves long-range dependency capture and computational efficiency. Evaluated on multiple neural PDE surrogate tasks, it significantly outperforms state-of-the-art methods and, for the first time, enables high-fidelity simulation on million-node unstructured meshes. To foster reproducibility and community advancement, we publicly release a new benchmark dataset, open-source implementation, and pre-trained models—accelerating the practical deployment of scientific machine learning in complex geometric domains.
📝 Abstract
The quadratic complexity of self-attention limits its applicability and scalability on large unstructured meshes. We introduce Fast Low-rank Attention Routing Engine (FLARE), a linear complexity self-attention mechanism that routes attention through fixed-length latent sequences. Each attention head performs global communication among $N$ tokens by projecting the input sequence onto a fixed length latent sequence of $M ll N$ tokens using learnable query tokens. By routing attention through a bottleneck sequence, FLARE learns a low-rank form of attention that can be applied at $O(NM)$ cost. FLARE not only scales to unprecedented problem sizes, but also delivers superior accuracy compared to state-of-the-art neural PDE surrogates across diverse benchmarks. We also release a new additive manufacturing dataset to spur further research. Our code is available at https://github.com/vpuri3/FLARE.py.