Extracting Sentence Embeddings from Pretrained Transformer Models

📅 2024-08-15
🏛️ Applied Sciences
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of sentence embedding extraction from pretrained Transformers (e.g., BERT). We systematically investigate and enhance three key strategies: token aggregation, representation post-processing, and external-knowledge-guided fine-tuning. Specifically, we propose novel representation shaping techniques—including weighted aggregation of multi-layer hidden states, normalized contrastive fine-tuning, and Wikidata-augmented supervision—achieving substantial improvements in semantic expressiveness of static or randomly initialized embeddings, without introducing additional parameters or inference overhead. Our approach outperforms strong baselines across 8 semantic textual similarity, 6 short-text clustering, and 12 classification tasks. Notably, optimized random embeddings achieve over 120% improvement on STS-B, approaching native BERT performance. Empirical results validate the effectiveness and cross-model generalizability of lightweight representation shaping for universal sentence embedding learning.

Technology Category

Application Category

📝 Abstract
Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in retrieval-augmented generation. But do commonly used plain averaging or prompt templates sufficiently capture and represent the underlying meaning? After providing a comprehensive review of existing sentence embedding extraction and refinement methods, we thoroughly test different combinations and our original extensions of the most promising ones on pretrained models. Namely, given 110 M parameters, BERT’s hidden representations from multiple layers, and many tokens, we try diverse ways to extract optimal sentence embeddings. We test various token aggregation and representation post-processing techniques. We also test multiple ways of using a general Wikitext dataset to complement BERT’s sentence embeddings. All methods are tested on eight Semantic Textual Similarity (STS), six short text clustering, and twelve classification tasks. We also evaluate our representation-shaping techniques on other static models, including random token representations. Proposed representation extraction methods improve the performance on STS and clustering tasks for all models considered. Very high improvements for static token-based models, especially random embeddings for STS tasks, almost reach the performance of BERT-derived representations. Our work shows that the representation-shaping techniques significantly improve sentence embeddings extracted from BERT-based and simple baseline models.
Problem

Research questions and friction points this paper is trying to address.

Evaluates methods for extracting sentence embeddings from transformer models.
Tests token aggregation and post-processing techniques on BERT models.
Improves performance on Semantic Textual Similarity and clustering tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes BERT's hidden layers
Tests token aggregation techniques
Improves sentence embedding performance
🔎 Similar Papers
No similar papers found.
L
Lukas Stankevicius
Faculty of Informatics, Kaunas University of Technology, LT-51368 Kaunas, Lithuania
M
M. Lukosevicius
Faculty of Informatics, Kaunas University of Technology, LT-51368 Kaunas, Lithuania