FLARE: Fast Low-rank Attention Routing Engine

📅 2025-08-17

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

The quadratic complexity $O(N^2)$ of standard self-attention severely limits its scalability to large-scale unstructured meshes. To address this, we propose FLARE—a linear-complexity ($O(NM)$, where $M ll N$) self-attention mechanism that maps variable-length mesh inputs into a fixed-length latent sequence via learnable query tokens, and employs low-rank attention routing alongside multi-head parallelism for efficient global modeling. FLARE jointly achieves long-range dependency capture and computational efficiency. Evaluated on multiple neural PDE surrogate tasks, it significantly outperforms state-of-the-art methods and, for the first time, enables high-fidelity simulation on million-node unstructured meshes. To foster reproducibility and community advancement, we publicly release a new benchmark dataset, open-source implementation, and pre-trained models—accelerating the practical deployment of scientific machine learning in complex geometric domains.

Technology Category

Application Category

📝 Abstract

The quadratic complexity of self-attention limits its applicability and scalability on large unstructured meshes. We introduce Fast Low-rank Attention Routing Engine (FLARE), a linear complexity self-attention mechanism that routes attention through fixed-length latent sequences. Each attention head performs global communication among $N$ tokens by projecting the input sequence onto a fixed length latent sequence of $M ll N$ tokens using learnable query tokens. By routing attention through a bottleneck sequence, FLARE learns a low-rank form of attention that can be applied at $O(NM)$ cost. FLARE not only scales to unprecedented problem sizes, but also delivers superior accuracy compared to state-of-the-art neural PDE surrogates across diverse benchmarks. We also release a new additive manufacturing dataset to spur further research. Our code is available at https://github.com/vpuri3/FLARE.py.

Problem

Research questions and friction points this paper is trying to address.

Reduces self-attention quadratic complexity on large meshes

Routes attention through fixed-length low-rank latent sequences

Enables scalable global communication among tokens efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank attention routing for linear complexity

Fixed-length latent sequence projection

Learnable query tokens for global communication

🔎 Similar Papers

No similar papers found.