Transformer Approximations from ReLUs

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work proposes a systematic approach that leverages the approximation capabilities of ReLU neural networks for elementary functions—such as multiplication, reciprocal, and min/max operations—to constructively approximate the softmax attention mechanism in Transformers. By establishing the first systematic mapping between ReLU approximation theory and attention mechanisms, the method enables goal-directed, resource-efficient approximations with explicit upper bounds on complexity. In contrast to generic universal approximation results, this study not only yields substantially tighter theoretical bounds but also opens new analytical pathways for enhancing both the interpretability and computational efficiency of Transformer models.
📝 Abstract
We provide a systematic recipe for translating ReLU approximation results to softmax attention mechanism. This recipe covers many common approximation targets. Importantly, it yields target-specific, economic resource bounds beyond universal approximation statements. We showcase the recipe on multiplication, reciprocal computation, and min/max primitives. These results provide new analytical tools for analyzing softmax transformer models.
Problem

Research questions and friction points this paper is trying to address.

Transformer
ReLU approximation
softmax attention
function approximation
resource bounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

ReLU approximation
softmax attention
transformer analysis
resource bounds
function primitives