Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge of modeling dynamic and implicit user preferences in context-aware conversational recommendation, particularly in complex settings where visual contexts and dialogue interactions jointly shape user interests. To this end, the paper proposes SiPeR, a novel framework that jointly models scene transitions and implicit preference reasoning for the first time. SiPeR assesses whether the current environment satisfies user needs through scene transition estimation and leverages Bayesian inverse reasoning combined with multimodal large language models to predict user preferences over candidate items. Experimental results demonstrate that SiPeR significantly improves both recommendation accuracy and dialogue generation quality on two benchmark datasets, effectively capturing user intent in dynamic contextual settings and enabling timely, accurate recommendations.

Technology Category

Application Category

📝 Abstract

Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional recommendations, SCR requires a deeper understanding of dynamic and implicit user preferences, as the surrounding scene often influences users' underlying interests, while both may evolve across conversations. This complexity significantly impacts the timing and relevance of recommendations. To address this, we propose situated preference reasoning (SiPeR), a novel framework that integrates two core mechanisms: (1) Scene transition estimation, which estimates whether the current scene satisfies user needs, and guides the user toward a more suitable scene when necessary; and (2) Bayesian inverse inference, which leverages the likelihood of multimodal large language models (MLLMs) to predict user preferences about candidate items within the scene. Extensive experiments on two representative benchmarks demonstrate SiPeR's superiority in both recommendation accuracy and response generation quality. The code and data are available at https://github.com/DongdingLin/SiPeR.

Problem

Research questions and friction points this paper is trying to address.

situated conversational recommendation

dynamic preferences

implicit preferences

scene understanding

preference reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

situated conversational recommendation

scene transition estimation

Bayesian inverse inference