🤖 AI Summary
This work proposes a systematic approach that leverages the approximation capabilities of ReLU neural networks for elementary functions—such as multiplication, reciprocal, and min/max operations—to constructively approximate the softmax attention mechanism in Transformers. By establishing the first systematic mapping between ReLU approximation theory and attention mechanisms, the method enables goal-directed, resource-efficient approximations with explicit upper bounds on complexity. In contrast to generic universal approximation results, this study not only yields substantially tighter theoretical bounds but also opens new analytical pathways for enhancing both the interpretability and computational efficiency of Transformer models.
📝 Abstract
We provide a systematic recipe for translating ReLU approximation results to softmax attention mechanism. This recipe covers many common approximation targets. Importantly, it yields target-specific, economic resource bounds beyond universal approximation statements. We showcase the recipe on multiplication, reciprocal computation, and min/max primitives. These results provide new analytical tools for analyzing softmax transformer models.