Can Instructed Retrieval Models Really Support Exploration?

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of supporting long-term interactive exploratory search, where user intent is often ambiguous and dynamically evolving, making it difficult for existing retrieval models to effectively respond to fine-grained instructions. The work presents the first systematic evaluation of instruction-tuned large language models (LLMs) in aspect-oriented, seed-guided exploratory search tasks, assessing both instruction-following fidelity and ranking relevance using an expert-annotated test set. Experiments employ both fine-tuned instruction-tuned LLMs and general-purpose LLMs integrated with Pairwise Ranking Prompting for document ranking. Results show that while the best-performing model achieves superior ranking relevance compared to non-instruction-aware baselines, its ability to follow instructions does not improve correspondingly—exhibiting insensitivity or even counterintuitive behaviors in response to user directives. This reveals a critical limitation of current instruction-based retrieval approaches in interactive exploratory settings.

Technology Category

Application Category

📝 Abstract
Exploratory searches are characterized by under-specified goals and evolving query intents. In such scenarios, retrieval models that can capture user-specified nuances in query intent and adapt results accordingly are desirable -- instruction-following retrieval models promise such a capability. In this work, we evaluate instructed retrievers for the prevalent yet under-explored application of aspect-conditional seed-guided exploration using an expert-annotated test collection. We evaluate both recent LLMs fine-tuned for instructed retrieval and general-purpose LLMs prompted for ranking with the highly performant Pairwise Ranking Prompting. We find that the best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches. However, we also find that instruction following performance, crucial to the user experience of interacting with models, does not mirror ranking relevance improvements and displays insensitivity or counter-intuitive behavior to instructions. Our results indicate that while users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions requiring greater sensitivity to instructions.
Problem

Research questions and friction points this paper is trying to address.

instructed retrieval
exploratory search
instruction following
aspect-conditional exploration
retrieval relevance
Innovation

Methods, ideas, or system contributions that make the work stand out.

instructed retrieval
exploratory search
instruction following
aspect-conditional exploration
LLM-based ranking
🔎 Similar Papers
No similar papers found.