Vague Preference Policy Learning for Conversational Recommendation

📅 2023-06-07

🏛️ ACM Transactions on Information Systems

📈 Citations: 1

✨ Influential: 0

career value

202K/year

🤖 AI Summary

In conversational recommendation systems (CRS), user preferences are often ambiguous and non-binary, yet conventional approaches rely on hard filtering, leading to the omission of relevant items. To address this, we propose a novel paradigm—Vague Preference-aware Multi-turn Conversational Recommendation (VPMCR). Its core contribution is the Vague Preference Policy Learning (VPPL) framework, comprising two modules: (1) Ambiguity-Aware Soft Estimation (ASE), which assigns soft confidence scores to avoid zero-probability truncation; and (2) Dynamics-Aware Policy Learning (DPL), integrating time-aware preference decay, preference-distribution-guided reinforcement learning, and multi-turn dialogue state modeling. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art methods, establishing new SOTA performance while enhancing both recommendation relevance and interaction robustness.

📝 Abstract

Conversational recommendation systems (CRS) effectively address information asymmetry by dynamically eliciting user preferences through multi-turn interactions. However, existing CRS methods commonly assume that users have clear, definite preferences for one or multiple target items. This assumption can lead to over-trusting user feedback, treating accepts/rejects as definitive signals to filter items and reduce the candidate space, potentially causing over-filtering and excluding relevant alternatives. In reality, users often exhibit vague preferences, lacking well-defined inclinations for certain attribute types (e.g., color, pattern), and their decision-making process during interactions is rarely binary. Instead, users’ choices are relative, reflecting a range of preferences rather than strict likes or dislikes. To address this issue, we introduce a novel scenario called Vague Preference Multi-round Conversational Recommendation (VPMCR), which employs a soft estimation mechanism to assign non-zero confidence scores to all candidate items, accommodating users’ vague and dynamic preferences while mitigating over-filtering. In the VPMCR setting, we introduce a solution called Vague Preference Policy Learning (VPPL), which consists of two main components: Ambiguity-aware Soft Estimation (ASE) and Dynamism-aware Policy Learning (DPL). ASE aims to accommodate the ambiguity in user preferences by estimating preference scores for both directed and inferred preferences, employing a choice-based approach and a time-aware preference decay strategy. DPL implements a policy learning framework, leveraging the preference distribution from ASE, to guide the conversation and adapt to changes in users’ preferences for making recommendations or querying attributes. Extensive experiments conducted on diverse datasets demonstrate the effectiveness of VPPL within the VPMCR framework, outperforming existing methods and setting a new benchmark for CRS research. Our work represents a significant advancement in accommodating the inherent ambiguity and relative decision-making processes exhibited by users, improving the overall performance and applicability of CRS in real-world settings.

Problem

Research questions and friction points this paper is trying to address.

Addresses user vague preferences in conversational recommendation systems

Mitigates over-filtering by accommodating non-binary user preferences

Handles dynamic preference changes through soft estimation mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft estimation mechanism for vague preferences

Ambiguity-aware soft estimation with time decay

Dynamism-aware policy learning for adaptive recommendations

🔎 Similar Papers

No similar papers found.