🤖 AI Summary
This work addresses the challenges of retrieving users’ implicit intents in conversational news recommendation and the failure of retrieval-augmented generation (RAG) under cold-start conditions. To this end, the authors propose an intent-driven Semantic ID (SID) generation framework based on a Generate-then-Match paradigm. Through two-stage training—multitask SID alignment followed by chain-of-thought distillation from GPT-4—the method maps user intents into hierarchical SID prefixes, which are then fuzzily matched against a news corpus to enable traceable, hallucination-free, high-precision recommendations. Additionally, a Profile-Aware Dual-Signal Reasoning (PADR) mechanism is introduced to support cold-start scenarios. Evaluated over a 152K SID space, the approach achieves a 12.4% L1 matching rate (4× random baseline), rising to 18.0% for cold-start users (6× baseline), at a hundredfold lower cost while outperforming GPT-4+Hybrid RAG on fine-grained metrics.
📝 Abstract
Conversational news recommendation requires grounding each suggestion in a rapidly evolving article corpus while addressing implicit user intents that lack explicit retrievable keywords. To characterize this scenario, we identify 6 intent types from production dialogues: five are implicit and pose fundamental challenges to standard RAG pipelines, forming a critical retrieve-first bottleneck. To address these issues, we introduce intent-driven Semantic ID (SID) generation under a Generate-then-Match paradigm. With two-stage training that consists of multi-task SID alignment and GPT-4 Chain-of-Thought distillation, an LLM maps diverse intents to hierarchical SID prefixes, which are then fuzzy-matched to the current news pool to guarantee fully grounded recommendations. Profile-Aware Dual-Signal Reasoning (PADR) further enables cold-start users to obtain valid recommendations using only profiles. On a mainstream Chinese news platform, our 7B model achieves 0% hallucination and 12.4% L1 match in the 152K open-generation SID space (4x random baseline). It matches GPT-4+Hybrid RAG on L1 while surpassing it on finer-grained metrics (L2 2x, Category +1.2pp) at ~100x lower cost. Cold-start users, where existing baselines score 0%, achieve 18.0% L1 (6x random), the highest among all user groups.