PromptEmbedder:: Efficient and Transferable Text Embedding via Dual-LLM Soft Prompting

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency and poor transferability of existing large language model (LLM) text embedding methods, which require full retraining when changing backbone architectures. The authors propose PromptEmbedder, the first approach to decouple embedding knowledge from backbone weights by employing a dual-LLM architecture: task-specific knowledge is encoded into a dedicated Prompting LLM via differentiable soft prompt generation and continuous relaxation mechanisms. By freezing the embedding LLM and introducing only a lightweight linear alignment matrix, the method efficiently adapts to new backbones. Evaluated on the MTEB benchmark, PromptEmbedder matches the performance of LoRA fine-tuning while reducing GPU memory usage by 40% and accelerating training by 3.7×, substantially enhancing cross-architecture scalability and generalization.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable efficacy in text embedding, yet current adaptation methods like LoRA face significant bottlenecks in computational efficiency and cross-architecture transferability. Whenever a new backbone emerges, existing approaches require costly retraining from scratch. To address this, we propose PromptEmbedder, a novel dual-LLM framework that decouples embedding knowledge from specific backbone weights. PromptEmbedder utilizes a Prompting LLM to generate instruction-aware soft prompts for a frozen Embedding LLM via a differentiable generation process with continuous relaxation, ensuring full gradient flow during contrastive training. By localizing task-specific knowledge within the Prompting LLM, adapting to new architectures requires only retraining a lightweight linear alignment matrix. Evaluations on the MTEB benchmark show that PromptEmbedder achieves comparable performance with LoRA finetuning while reducing GPU memory by 40% and accelerating training by 3.7x. Our approach establishes a scalable, architecture-agnostic paradigm for efficient LLM-based representation learning.
Problem

Research questions and friction points this paper is trying to address.

text embedding
computational efficiency
cross-architecture transferability
LLM adaptation
retraining cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-LLM
soft prompting
transferable embedding
parameter-efficient fine-tuning
contrastive training
🔎 Similar Papers
No similar papers found.
Y
Yu-Che Tsai
Department of Computer Science and Information Engineering, National Taiwan University
Kuan-Yu Chen
Kuan-Yu Chen
National Taiwan University of Science and Technology
Language ModelingSpeech RecognitionInformation RetrievalSummarizationNature Language Processing
Y
Yuan-Hao Chen
Department of Computer Science and Information Engineering, National Taiwan University
Y
Yu-Han Chang
Department of Computer Science and Information Engineering, National Taiwan University
C
Ching-Yu Tsai
Department of Computer Science and Information Engineering, National Taiwan University
Y
Yu-Hsiang Chuang
Department of Computer Science and Information Engineering, National Taiwan University
Shou-De Lin
Shou-De Lin
National Taiwan University
AImachine learningnatural language processing