🤖 AI Summary
Standard language modeling loss in open-domain dialogue often fails to capture the core semantic content of responses. To address this, we propose a keyword-level Bag-of-Keywords (BoK) auxiliary loss—the first to integrate keyword prediction (rather than conventional bag-of-words modeling) into dialogue generation objectives—enabling joint optimization of semantic focus and posterior interpretability while supporting reference-free evaluation. Our method introduces a differentiable BoK loss function, jointly optimized with the primary language modeling loss via a weighted sum; it is architecture-agnostic, compatible with both encoder-decoder (e.g., T5) and decoder-only (e.g., DialoGPT) models. Experiments on DailyDialog and Persona-Chat demonstrate significant improvements in response quality. Moreover, the BoK-based language model (BoK-LM) serves as a highly effective reference-free evaluation metric, achieving state-of-the-art correlation with human judgments while offering intrinsic interpretability—establishing a novel paradigm for dialogue generation.
📝 Abstract
The standard language modeling (LM) loss by itself has been shown to be inadequate for effective dialogue modeling. As a result, various training approaches, such as auxiliary loss functions and leveraging human feedback, are being adopted to enrich open-domain dialogue systems. One such auxiliary loss function is Bag-of-Words (BoW) loss, defined as the cross-entropy loss for predicting all the words/tokens of the next utterance. In this work, we propose a novel auxiliary loss named Bag-of-Keywords (BoK) loss to capture the central thought of the response through keyword prediction and leverage it to enhance the generation of meaningful and interpretable responses in open-domain dialogue systems. BoK loss upgrades the BoW loss by predicting only the keywords or critical words/tokens of the next utterance, intending to estimate the core idea rather than the entire response. We incorporate BoK loss in both encoder-decoder (T5) and decoder-only (DialoGPT) architecture and train the models to minimize the weighted sum of BoK and LM (BoK-LM) loss. We perform our experiments on two popular open-domain dialogue datasets, DailyDialog and Persona-Chat. We show that the inclusion of BoK loss improves the dialogue generation of backbone models while also enabling post-hoc interpretability. We also study the effectiveness of BoK-LM loss as a reference-free metric and observe comparable performance to the state-of-the-art metrics on various dialogue evaluation datasets.