Can Cross Encoders Produce Useful Sentence Embeddings?

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work investigates whether cross-encoders (CEs), traditionally confined to re-ranking, can directly generate high-quality sentence embeddings. The authors first observe that early-layer CE representations exhibit strong semantic expressiveness. Leveraging this insight, they propose a novel CE-to-DE knowledge distillation paradigm: intermediate CE layer features serve as teacher signals to distill discriminative knowledge into a lightweight dual-encoder (DE) via contrastive learning and vector-space alignment loss. Evaluated on benchmarks including MSMARCO, the distilled DE achieves retrieval accuracy close to that of the full CE while enabling 5.15× inference speedup and substantially reducing deployment overhead. The core contribution is a paradigm shift—redefining the CE not merely as a re-ranker but as an effective sentence embedding generator—and establishing the first CE→DE distillation framework explicitly designed for embedding capability transfer.

Technology Category

Application Category

📝 Abstract

Cross encoders (CEs) are trained with sentence pairs to detect relatedness. As CEs require sentence pairs at inference, the prevailing view is that they can only be used as re-rankers in information retrieval pipelines. Dual encoders (DEs) are instead used to embed sentences, where sentence pairs are encoded by two separate encoders with shared weights at training, and a loss function that ensures the pair's embeddings lie close in vector space if the sentences are related. DEs however, require much larger datasets to train, and are less accurate than CEs. We report a curious finding that embeddings from earlier layers of CEs can in fact be used within an information retrieval pipeline. We show how to exploit CEs to distill a lighter-weight DE, with a 5.15x speedup in inference time.

Problem

Research questions and friction points this paper is trying to address.

Cross encoders generate sentence embeddings

Enhance information retrieval efficiency

Distill lightweight dual encoders

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross Encoders for embeddings

Distill lighter-weight Dual Encoders

Speedup inference time significantly

🔎 Similar Papers

Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment