Task-Adaptive Embedding Refinement via Test-time LLM Guidance

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the challenge that embedding models often struggle with complex, fine-grained queries in zero-shot retrieval and classification tasks. To overcome this limitation, the authors propose a novel test-time approach that dynamically refines query embeddings using large language models (LLMs). Their method introduces, for the first time, an LLM-guided mechanism into the embedding refinement process, leveraging a lightweight document feedback loop to adapt the embedding space to downstream tasks without requiring any retraining. Experimental results demonstrate substantial performance gains across multiple challenging benchmarks, with relative improvements of up to 25%, significantly enhancing both ranking quality and class discriminability.

📝 Abstract

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling embeddings to adapt in real time to the target task. We conduct extensive experiments with state-of-the-art text embedding models across a diverse set of challenging search and classification benchmarks. Empirical results indicate that LLM-guided query refinement yields consistent gains across all models and datasets, with relative improvements of up to +25% in literature search, intent detection, key-point matching, and nuanced query-instruction following. The refined queries improve ranking quality and induce clearer binary separation across the corpus, enabling the embedding space to better reflect the nuanced, task-specific constraints of each ad-hoc user query. Importantly, this expands the range of practical settings in which embedding models can be effectively deployed, making them a compelling alternative when costly LLM pipelines are not viable at corpus-scale. We release our experimental code for reproducibility, at https://github.com/IBM/task-aware-embedding-refinement.

Problem

Research questions and friction points this paper is trying to address.

zero-shot search

embedding models

query refinement

task adaptation

classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time adaptation

LLM-guided refinement

embedding models