Do LLMs Recognize Your Latent Preferences? A Benchmark for Latent Information Discovery in Personalized Interaction

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Large language models (LLMs) exhibit strong generalization capabilities but struggle to infer users’ implicit preferences, raising the fundamental question of whether conversational interaction can effectively uncover latent user needs. Method: We introduce the first systematic, multi-task benchmark for preference inference—comprising 20 Questions, personalized question answering, and text summarization—spanning progressively complex scenarios. Our evaluation framework employs a fine-grained, three-agent interaction protocol (user, assistant, judge) with multi-turn dialogue and context-aware per-turn assessment. Contribution/Results: Our approach quantifies LLMs’ ability to elicit implicit user attributes across tasks for the first time, revealing substantial performance variation (32%–98%) and pronounced context sensitivity. The benchmark enables reproducible, modular analysis of preference discovery in personalized human-AI interaction, establishing an empirical foundation for future research on adaptive, user-centered dialogue systems.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) excel at producing broadly relevant text, but this generality becomes a limitation when user-specific preferences are required, such as recommending restaurants or planning travel. In these scenarios, users rarely articulate every preference explicitly; instead, much of what they care about remains latent, waiting to be inferred. This raises a fundamental question: Can LLMs uncover and reason about such latent information through conversation? We address this problem by introducing a unified benchmark for evaluating latent information discovery - the ability of LLMs to reveal and utilize hidden user attributes through multi-turn interaction. The benchmark spans three progressively realistic settings: the classic 20 Questions game, Personalized Question Answering, and Personalized Text Summarization. All tasks share a tri-agent framework (User, Assistant, Judge) enabling turn-level evaluation of elicitation and adaptation. Our results reveal that while LLMs can indeed surface latent information through dialogue, their success varies dramatically with context: from 32% to 98%, depending on task complexity, topic, and number of hidden attributes. This benchmark provides the first systematic framework for studying latent information discovery in personalized interaction, highlighting that effective preference inference remains an open frontier for building truly adaptive AI systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to discover latent user preferences

Benchmarking hidden attribute inference through multi-turn conversations

Assessing contextual variability in personalized information elicitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn dialogue framework for latent preference discovery

Tri-agent evaluation system for elicitation and adaptation

Unified benchmark spanning three personalized interaction scenarios

🔎 Similar Papers

Stereotype or Personalization? User Identity Biases Chatbot Recommendations