LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

This work addresses a key limitation in current large language model (LLM)-based sentence embedding methods, which predominantly rely on hidden states optimized for next-token prediction and thus struggle to capture holistic sentence-level semantics. The study reveals, for the first time, that value vectors within the attention mechanism encode richer sentence-level semantic information than conventional hidden states. Building on this insight, the authors propose Value Aggregation—a training-free approach—and its enhanced variant, Aligned Weighted Value Aggregation (AlignedWVA), which aggregates value vectors across layers and tokens while aligning them with the output projection matrix. Without any model training, AlignedWVA substantially outperforms existing LLM-based embedding methods, including the computationally expensive MetaEOL, establishing a new state-of-the-art for training-free LLM sentence embeddings.

Technology Category

Application Category

📝 Abstract

Sentence representations are foundational to many Natural Language Processing (NLP) applications. While recent methods leverage Large Language Models (LLMs) to derive sentence representations, most rely on final-layer hidden states, which are optimized for next-token prediction and thus often fail to capture global, sentence-level semantics. This paper introduces a novel perspective, demonstrating that attention value vectors capture sentence semantics more effectively than hidden states. We propose Value Aggregation (VA), a simple method that pools token values across multiple layers and token indices. In a training-free setting, VA outperforms other LLM-based embeddings, even matches or surpasses the ensemble-based MetaEOL. Furthermore, we demonstrate that when paired with suitable prompts, the layer attention outputs can be interpreted as aligned weighted value vectors. Specifically, the attention scores of the last token function as the weights, while the output projection matrix ($W_O$) aligns these weighted value vectors with the common space of the LLM residual stream. This refined method, termed Aligned Weighted VA (AlignedWVA), achieves state-of-the-art performance among training-free LLM-based embeddings, outperforming the high-cost MetaEOL by a substantial margin. Finally, we highlight the potential of obtaining strong LLM embedding models through fine-tuning Value Aggregation.

Problem

Research questions and friction points this paper is trying to address.

sentence representations

Large Language Models

hidden states

sentence semantics

embedding

Innovation

Methods, ideas, or system contributions that make the work stand out.

attention values

sentence embeddings

value aggregation