LMK>CLS: Landmark Pooling for Dense Embeddings

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the limitations of conventional pooling strategies—such as [CLS] or mean pooling—which often bias information toward the beginning of variable-length sequences or dilute locally salient features, thereby struggling to balance short- and long-range contextual modeling. To overcome this, we propose Landmark (LMK) pooling, a novel approach that partitions the input sequence into chunks and inserts learnable landmark tokens between them. Global representations are then derived by mean-pooling the landmark embeddings, effectively integrating local saliency with global context. Experiments on Transformer encoders demonstrate that LMK matches state-of-the-art performance on short-context retrieval tasks while significantly outperforming existing methods in long-context scenarios, confirming its effectiveness, balance, and scalability for dense embedding pooling.

Technology Category

Application Category

📝 Abstract

Representation learning is central to many downstream tasks such as search, clustering, classification, and reranking. State-of-the-art sequence encoders typically collapse a variable-length token sequence to a single vector using a pooling operator, most commonly a special [CLS] token or mean pooling over token embeddings. In this paper, we identify systematic weaknesses of these pooling strategies: [CLS] tends to concentrate information toward the initial positions of the sequence and can under-represent distributed evidence, while mean pooling can dilute salient local signals, sometimes leading to worse short-context performance. To address these issues, we introduce Landmark (LMK) pooling, which partitions a sequence into chunks, inserts landmark tokens between chunks, and forms the final representation by mean-pooling the landmark token embeddings. This simple mechanism improves long-context extrapolation without sacrificing local salient features, at the cost of introducing a small number of special tokens. We empirically demonstrate that LMK pooling matches existing methods on short-context retrieval tasks and yields substantial improvements on long-context tasks, making it a practical and scalable alternative to existing pooling methods.

Problem

Research questions and friction points this paper is trying to address.

pooling

representation learning

sequence encoding

dense embeddings

long-context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Landmark pooling

dense embeddings

sequence representation