🤖 AI Summary
Existing dialogue systems suffer from limited optimization of holistic dialogue impressions—such as consistency, persona coherence, and empathy—due to their reliance on single-turn response modeling and the absence of interpretable, multi-dimensional evaluation. To address this, we propose a dialogue-level impression alignment method that operates across the full dialogue trajectory. Our approach introduces an interpretable, multi-metric reward model encompassing 12 fine-grained impression dimensions, integrated within a unified framework combining supervised fine-tuning (SFT) and AI feedback-based reinforcement learning (RLAIF). The method synergistically incorporates LLM-based reward modeling, zero-/few-shot prompting, and dialogue policy optimization. Both automated and human evaluations demonstrate statistically significant improvements across all impression metrics, substantial gains in dialogue naturalness and user-perceived quality, and—critically—the first systematic modeling and optimization of dialogue-level persona and emotional consistency.
📝 Abstract
To improve user engagement during conversations with dialogue systems, we must improve individual dialogue responses and dialogue impressions such as consistency, personality, and empathy throughout the entire dialogue. While such dialogue systems have been developing rapidly with the help of large language models (LLMs), reinforcement learning from AI feedback (RLAIF) has attracted attention to align LLM-based dialogue models for such dialogue impressions. In RLAIF, a reward model based on another LLM is used to create a training signal for an LLM-based dialogue model using zero-shot/few-shot prompting techniques. However, evaluating an entire dialogue only by prompting LLMs is challenging. In this study, the supervised fine-tuning (SFT) of LLMs prepared reward models corresponding to 12 metrics related to the impression of the entire dialogue for evaluating dialogue responses. We tuned our dialogue models using the reward model signals as feedback to improve the impression of the system. The results of automatic and human evaluations showed that tuning the dialogue model using our reward model corresponding to dialogue impression improved the evaluation of individual metrics and the naturalness of the dialogue response.