๐ค AI Summary
This work investigates the conditions under which soft-attention Transformers can exactly simulate hard attentionโi.e., deterministically attending to specific subsequences of the input.
Method: We introduce a synergistic mechanism combining temperature scaling with unbounded positional encodings, enabling precise control over attention concentration.
Contribution/Results: We establish, for the first time, necessary and sufficient theoretical conditions for soft attention to emulate hard attention. We prove that this mechanism allows Softmax attention to exactly compute a broad class of linear temporal logic (LTL) formulas and to strictly simulate all uniform-tieless average-based hard attention models. Our analysis reveals the critical role of the temperature parameter and positional encoding design in governing logical expressivity and its transfer across attention paradigms. This significantly extends the theoretical expressive capacity of soft-attention models and provides novel foundations for interpretability and formal verification of attention mechanisms.
๐ Abstract
We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several variants of linear temporal logic, whose formulas have been previously been shown to be computable using hard attention transformers. We demonstrate how soft attention transformers can compute formulas of these logics using unbounded positional embeddings or temperature scaling. Second, we demonstrate how temperature scaling allows softmax transformers to simulate a large subclass of average-hard attention transformers, those that have what we call the uniform-tieless property.