🤖 AI Summary
Large language models (LLMs) exhibit poor performance on logical reasoning tasks—such as first-order quantifier interpretation (“all”/“some”) and linear function reasoning—under in-context learning (ICL), primarily due to inherent limitations of the Softmax attention scoring function. To address this, we propose scaled signed averaging (SSA), a differentiable, sign-aware attention scoring mechanism that alleviates logical reasoning bottlenecks at the scoring-function level for the first time. SSA is architecture-agnostic, supporting both encoder-only and decoder-only LLMs. In multi-task linguistic probing experiments, SSA consistently outperforms Softmax baselines across quantifier identification, numerical reasoning, and other structured reasoning tasks—achieving state-of-the-art results on several benchmarks. This work establishes a novel paradigm for enhancing LLMs’ structured reasoning capabilities by rethinking the fundamental attention scoring mechanism.
📝 Abstract
Large language models (LLMs) exhibit a remarkable capacity to learn by analogy, known as in-context learning (ICL). However, recent studies have revealed limitations in this ability. In this paper, we examine these limitations on tasks involving first-order quantifiers such as {em all} and {em some}, as well as on ICL with linear functions. We identify Softmax, the scoring function in attention mechanism, as a contributing factor to these constraints. To address this, we propose extbf{scaled signed averaging (SSA)}, a novel alternative to Softmax. Empirical results show that SSA dramatically improves performance on our target tasks. Furthermore, we evaluate both encoder-only and decoder-only transformers models with SSA, demonstrating that they match or exceed their Softmax-based counterparts across a variety of linguistic probing tasks.