Can Instructed Retrieval Models Really Support Exploration?

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the challenge of supporting long-term interactive exploratory search, where user intent is often ambiguous and dynamically evolving, making it difficult for existing retrieval models to effectively respond to fine-grained instructions. The work presents the first systematic evaluation of instruction-tuned large language models (LLMs) in aspect-oriented, seed-guided exploratory search tasks, assessing both instruction-following fidelity and ranking relevance using an expert-annotated test set. Experiments employ both fine-tuned instruction-tuned LLMs and general-purpose LLMs integrated with Pairwise Ranking Prompting for document ranking. Results show that while the best-performing model achieves superior ranking relevance compared to non-instruction-aware baselines, its ability to follow instructions does not improve correspondingly—exhibiting insensitivity or even counterintuitive behaviors in response to user directives. This reveals a critical limitation of current instruction-based retrieval approaches in interactive exploratory settings.

Technology Category

Application Category

📝 Abstract

Exploratory searches are characterized by under-specified goals and evolving query intents. In such scenarios, retrieval models that can capture user-specified nuances in query intent and adapt results accordingly are desirable -- instruction-following retrieval models promise such a capability. In this work, we evaluate instructed retrievers for the prevalent yet under-explored application of aspect-conditional seed-guided exploration using an expert-annotated test collection. We evaluate both recent LLMs fine-tuned for instructed retrieval and general-purpose LLMs prompted for ranking with the highly performant Pairwise Ranking Prompting. We find that the best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches. However, we also find that instruction following performance, crucial to the user experience of interacting with models, does not mirror ranking relevance improvements and displays insensitivity or counter-intuitive behavior to instructions. Our results indicate that while users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions requiring greater sensitivity to instructions.

Problem

Research questions and friction points this paper is trying to address.

instructed retrieval

exploratory search

instruction following

aspect-conditional exploration

retrieval relevance

Innovation

Methods, ideas, or system contributions that make the work stand out.

instructed retrieval

exploratory search

instruction following