BloomIntent: Automating Search Evaluation with LLM-Generated Fine-Grained User Intents

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing search evaluation methods struggle to capture users’ fine-grained and heterogeneous information intents, leading to inaccurate measurement of alignment between retrieval systems and actual user goals. To address this, we propose BloomIntent—the first intent-level evaluation framework that elevates assessment granularity from the query level to the intent level. BloomIntent generates interpretable, evaluable fine-grained intents by jointly modeling user attributes and information need types. It integrates a hierarchical intent taxonomy, semantic clustering, and large language model–driven automated scoring to construct a structured evaluation interface, enabling scalable, multi-dimensional intent analysis and insight generation. Experimental results show 72% agreement between BloomIntent-generated intents and expert judgments across three technical evaluations. A case study further demonstrates its effectiveness in uncovering unmet user needs, significantly improving intent alignment and practical utility in search evaluation.

Technology Category

Application Category

📝 Abstract

If 100 people issue the same search query, they may have 100 different goals. While existing work on user-centric AI evaluation highlights the importance of aligning systems with fine-grained user intents, current search evaluation methods struggle to represent and assess this diversity. We introduce BloomIntent, a user-centric search evaluation method that uses user intents as the evaluation unit. BloomIntent first generates a set of plausible, fine-grained search intents grounded on taxonomies of user attributes and information-seeking intent types. Then, BloomIntent provides an automated evaluation of search results against each intent powered by large language models. To support practical analysis, BloomIntent clusters semantically similar intents and summarizes evaluation outcomes in a structured interface. With three technical evaluations, we showed that BloomIntent generated fine-grained, evaluable, and realistic intents and produced scalable assessments of intent-level satisfaction that achieved 72% agreement with expert evaluators. In a case study (N=4), we showed that BloomIntent supported search specialists in identifying intents for ambiguous queries, uncovering underserved user needs, and discovering actionable insights for improving search experiences. By shifting from query-level to intent-level evaluation, BloomIntent reimagines how search systems can be assessed -- not only for performance but for their ability to serve a multitude of user goals.

Problem

Research questions and friction points this paper is trying to address.

Evaluating search systems for diverse user intents rather than query-level performance

Automating assessment of search results against fine-grained user goals using LLMs

Addressing limitations of current methods in representing user intent diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates fine-grained user intents from taxonomies

Evaluates search results per intent using LLMs

Clusters similar intents and summarizes outcomes

🔎 Similar Papers

Taxonomy and Analysis of Sensitive User Queries in Generative AI Search