Improving in-context learning with a better scoring function

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Large language models (LLMs) exhibit poor performance on logical reasoning tasks—such as first-order quantifier interpretation (“all”/“some”) and linear function reasoning—under in-context learning (ICL), primarily due to inherent limitations of the Softmax attention scoring function. To address this, we propose scaled signed averaging (SSA), a differentiable, sign-aware attention scoring mechanism that alleviates logical reasoning bottlenecks at the scoring-function level for the first time. SSA is architecture-agnostic, supporting both encoder-only and decoder-only LLMs. In multi-task linguistic probing experiments, SSA consistently outperforms Softmax baselines across quantifier identification, numerical reasoning, and other structured reasoning tasks—achieving state-of-the-art results on several benchmarks. This work establishes a novel paradigm for enhancing LLMs’ structured reasoning capabilities by rethinking the fundamental attention scoring mechanism.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit a remarkable capacity to learn by analogy, known as in-context learning (ICL). However, recent studies have revealed limitations in this ability. In this paper, we examine these limitations on tasks involving first-order quantifiers such as {em all} and {em some}, as well as on ICL with linear functions. We identify Softmax, the scoring function in attention mechanism, as a contributing factor to these constraints. To address this, we propose extbf{scaled signed averaging (SSA)}, a novel alternative to Softmax. Empirical results show that SSA dramatically improves performance on our target tasks. Furthermore, we evaluate both encoder-only and decoder-only transformers models with SSA, demonstrating that they match or exceed their Softmax-based counterparts across a variety of linguistic probing tasks.

Problem

Research questions and friction points this paper is trying to address.

Improving in-context learning limitations with quantifiers

Addressing Softmax constraints in attention mechanisms

Proposing SSA as superior alternative to Softmax

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed scaled signed averaging (SSA) technique

Replaced Softmax in attention mechanism

Improved performance on linguistic probing tasks

🔎 Similar Papers

In-Context Learning with Long-Context Models: An In-Depth Exploration