🤖 AI Summary
To address the problem of contextual and knowledge-base redundancy interfering with response matching in retrieval-based dialogue systems, this paper proposes a multi-turn response selection model that introduces, for the first time in this task, a dual-level dynamic filtering mechanism for context and knowledge. Specifically, a query-driven pre-filtering stage identifies salient context and knowledge snippets, while a response-guided post-filtering stage refines relevance modeling. The model integrates both word-level and utterance-level attention to enable fine-grained, three-way interaction among context, knowledge, and candidate responses. Evaluated on two mainstream benchmark datasets, our approach achieves significant improvements over state-of-the-art models, demonstrating its effectiveness in precisely identifying and focusing on relevant contextual and knowledge fragments. This work provides a novel and effective paradigm for enhancing response selection accuracy in multi-turn retrieval-based dialogue systems.
📝 Abstract
Recently, knowledge-grounded conversations in the open domain gain great attention from researchers. Existing works on retrieval-based dialogue systems have paid tremendous efforts to utilize neural networks to build a matching model, where all of the context and knowledge contents are used to match the response candidate with various representation methods. Actually, different parts of the context and knowledge are differentially important for recognizing the proper response candidate, as many utterances are useless due to the topic shift. Those excessive useless information in the context and knowledge can influence the matching process and leads to inferior performance. To address this problem, we propose a multi-turn extbf{R}esponse extbf{S}election extbf{M}odel that can extbf{D}etect the relevant parts of the extbf{C}ontext and extbf{K}nowledge collection ( extbf{RSM-DCK}). Our model first uses the recent context as a query to pre-select relevant parts of the context and knowledge collection at the word-level and utterance-level semantics. Further, the response candidate interacts with the selected context and knowledge collection respectively. In the end, The fused representation of the context and response candidate is utilized to post-select the relevant parts of the knowledge collection more confidently for matching. We test our proposed model on two benchmark datasets. Evaluation results indicate that our model achieves better performance than the existing methods, and can effectively detect the relevant context and knowledge for response selection.