🤖 AI Summary
Conversational recommendation systems (CRS) suffer from insufficient integration of dialogue context and external knowledge graphs (KGs), leading to coarse-grained user preference modeling and suboptimal recommendation accuracy. To address this, we propose a stepwise curriculum learning framework that progressively aligns dialogue history with KG entities in three stages, enabling fine-grained semantic matching and knowledge-consistent fusion. We further introduce a novel dual-prefix prompt tuning mechanism: leveraging frozen pre-trained language models, it employs an F-Former architecture for lightweight, joint contextual–knowledge modeling. Our approach achieves state-of-the-art performance on two benchmark datasets—significantly outperforming existing methods in both recommendation accuracy (Recall@10) and conversational quality (BLEU-4, F1). These results empirically validate the effectiveness of curriculum-guided knowledge alignment for CRS.
📝 Abstract
Conversational recommender systems (CRSs) aim to proactively capture user preferences through natural language dialogue and recommend high-quality items. To achieve this, CRS gathers user preferences via a dialog module and builds user profiles through a recommendation module to generate appropriate recommendations. However, existing CRS faces challenges in capturing the deep semantics of user preferences and dialogue context. In particular, the efficient integration of external knowledge graph (KG) information into dialogue generation and recommendation remains a pressing issue. Traditional approaches typically combine KG information directly with dialogue content, which often struggles with complex semantic relationships, resulting in recommendations that may not align with user expectations.
To address these challenges, we introduce STEP, a conversational recommender centered on pre-trained language models that combines curriculum-guided context-knowledge fusion with lightweight task-specific prompt tuning. At its heart, an F-Former progressively aligns the dialogue context with knowledge-graph entities through a three-stage curriculum, thus resolving fine-grained semantic mismatches. The fused representation is then injected into the frozen language model via two minimal yet adaptive prefix prompts: a conversation prefix that steers response generation toward user intent and a recommendation prefix that biases item ranking toward knowledge-consistent candidates. This dual-prompt scheme allows the model to share cross-task semantics while respecting the distinct objectives of dialogue and recommendation. Experimental results show that STEP outperforms mainstream methods in the precision of recommendation and dialogue quality in two public datasets.