Unique Hard Attention: A Tale of Two Sides

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work investigates the expressive power of unique hard attention in Transformers, focusing on the distinction between leftmost- and rightmost-winning position selection strategies and their logical characterization relative to Linear Temporal Logic (LTL). Using formal language theory, logical definability, and finite-precision modeling, we establish the first rigorous results: leftmost-winning hard attention is expressively equivalent to soft attention and strictly weaker than full LTL—capturing precisely the classical fragment LTL[𝑋,𝑈]—whereas rightmost-winning hard attention exhibits strictly greater expressivity. These findings correct prior theoretical characterizations of hard attention’s expressive boundaries and provide a more precise, logically grounded, and empirically verifiable framework for analyzing real-world Transformer behavior—particularly in sequential decision-making and temporal reasoning tasks.

Technology Category

Application Category

📝 Abstract

Understanding the expressive power of transformers has recently attracted attention, as it offers insights into their abilities and limitations. Many studies analyze unique hard attention transformers, where attention selects a single position that maximizes the attention scores. When multiple positions achieve the maximum score, either the rightmost or the leftmost of those is chosen. In this paper, we highlight the importance of this seeming triviality. Recently, finite-precision transformers with both leftmost- and rightmost-hard attention were shown to be equivalent to Linear Temporal Logic (LTL). We show that this no longer holds with only leftmost-hard attention -- in that case, they correspond to a emph{strictly weaker} fragment of LTL. Furthermore, we show that models with leftmost-hard attention are equivalent to emph{soft} attention, suggesting they may better approximate real-world transformers than right-attention models. These findings refine the landscape of transformer expressivity and underscore the role of attention directionality.

Problem

Research questions and friction points this paper is trying to address.

Analyzes unique hard attention transformers' expressive power.

Compares leftmost- and rightmost-hard attention in transformers.

Links leftmost-hard attention to weaker LTL fragments.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unique hard attention selects single position.

Leftmost-hard attention weakens LTL equivalence.

Leftmost-hard attention approximates soft attention.

🔎 Similar Papers

No similar papers found.