GEM: Empowering LLM for both Embedding Generation and Language Understanding

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the redundancy and semantic inconsistency arising from reliance on external embedding models in large decoder-only language models (LLMs), this work proposes an architecture-agnostic, self-supervised method. By injecting special tokens and dynamically modulating the attention mask, the approach enables native, high-quality text embedding generation within standard autoregressive decoders—without architectural modifications. This seamlessly integrates embedding capability into the LLM itself, unifying its generative and representational functions. On the MTEB benchmark, models ranging from 1B to 8B parameters achieve an average improvement of 12.7% in embedding performance, while maintaining near-lossless generative capability—MMLU accuracy drops by less than 0.3%. These results demonstrate the effectiveness and generalizability of co-optimizing generation and embedding capacities. To our knowledge, this is the first end-to-end, self-supervised framework enabling native embedding generation in decoder-only LLMs.

Technology Category

Application Category

📝 Abstract
Large decoder-only language models (LLMs) have achieved remarkable success in generation and reasoning tasks, where they generate text responses given instructions. However, many applications, e.g., retrieval augmented generation (RAG), still rely on separate embedding models to generate text embeddings, which can complicate the system and introduce discrepancies in understanding of the query between the embedding model and LLMs. To address this limitation, we propose a simple self-supervised approach, Generative Embedding large language Model (GEM), that enables any large decoder-only LLM to generate high-quality text embeddings while maintaining its original text generation and reasoning capabilities. Our method inserts new special token(s) into a text body, and generates summarization embedding of the text by manipulating the attention mask. This method could be easily integrated into post-training or fine tuning stages of any existing LLMs. We demonstrate the effectiveness of our approach by applying it to two popular LLM families, ranging from 1B to 8B parameters, and evaluating the transformed models on both text embedding benchmarks (MTEB) and NLP benchmarks (MMLU). The results show that our proposed method significantly improves the original LLMs on MTEB while having a minimal impact on MMLU. Our strong results indicate that our approach can empower LLMs with state-of-the-art text embedding capabilities while maintaining their original NLP performance
Problem

Research questions and friction points this paper is trying to address.

Enabling LLMs to generate high-quality text embeddings
Eliminating reliance on separate embedding models for RAG
Maintaining original text generation and reasoning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised method for embedding generation
Special tokens and attention mask manipulation
Maintains original text generation capabilities
🔎 Similar Papers
No similar papers found.