FLARE: Fast Low-rank Attention Routing Engine

📅 2025-08-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The quadratic complexity $O(N^2)$ of standard self-attention severely limits its scalability to large-scale unstructured meshes. To address this, we propose FLARE—a linear-complexity ($O(NM)$, where $M ll N$) self-attention mechanism that maps variable-length mesh inputs into a fixed-length latent sequence via learnable query tokens, and employs low-rank attention routing alongside multi-head parallelism for efficient global modeling. FLARE jointly achieves long-range dependency capture and computational efficiency. Evaluated on multiple neural PDE surrogate tasks, it significantly outperforms state-of-the-art methods and, for the first time, enables high-fidelity simulation on million-node unstructured meshes. To foster reproducibility and community advancement, we publicly release a new benchmark dataset, open-source implementation, and pre-trained models—accelerating the practical deployment of scientific machine learning in complex geometric domains.

Technology Category

Application Category

📝 Abstract
The quadratic complexity of self-attention limits its applicability and scalability on large unstructured meshes. We introduce Fast Low-rank Attention Routing Engine (FLARE), a linear complexity self-attention mechanism that routes attention through fixed-length latent sequences. Each attention head performs global communication among $N$ tokens by projecting the input sequence onto a fixed length latent sequence of $M ll N$ tokens using learnable query tokens. By routing attention through a bottleneck sequence, FLARE learns a low-rank form of attention that can be applied at $O(NM)$ cost. FLARE not only scales to unprecedented problem sizes, but also delivers superior accuracy compared to state-of-the-art neural PDE surrogates across diverse benchmarks. We also release a new additive manufacturing dataset to spur further research. Our code is available at https://github.com/vpuri3/FLARE.py.
Problem

Research questions and friction points this paper is trying to address.

Reduces self-attention quadratic complexity on large meshes
Routes attention through fixed-length low-rank latent sequences
Enables scalable global communication among tokens efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank attention routing for linear complexity
Fixed-length latent sequence projection
Learnable query tokens for global communication
🔎 Similar Papers
No similar papers found.
V
Vedant Puri
Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
A
Aditya Joglekar
Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
Kevin Ferguson
Kevin Ferguson
Research Assistant
Rotorcraft Flight Dynamics
Y
Yu-hsuan Chen
Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
Yongjie Jessica Zhang
Yongjie Jessica Zhang
George Tallman Ladd and Florence Barrett Ladd Professor, Carnegie Mellon University
Computational GeometryIsogeometric AnalysisMesh GenerationData-Driven ModelingFinite Element
Levent Burak Kara
Levent Burak Kara
Carnegie Mellon University
Computer Aided DesignComputer GraphicsMechanical EngineeringMachine LearningArtificial