Score Before You Speak: Improving Persona Consistency in Dialogue Generation using Response Quality Scores

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Addressing the challenges of modeling persona consistency and insufficient data diversity in dialogue generation, this paper proposes Score-Based Self-training (SBS), a novel framework that explicitly incorporates response quality scores into end-to-end training of generative models. Leveraging semantic similarity as a proxy metric, SBS performs controllable data augmentation via noun substitution and conditions generation on score signals embedded in input prompts. This unified approach jointly optimizes persona consistency and response quality. Empirical evaluation demonstrates consistent performance gains across model scales—from millions to billions of parameters. On PERSONA-CHAT and ConvAI2 benchmarks, SBS achieves new state-of-the-art results. Ablation studies confirm the critical role of score signals in enhancing consistency modeling and substantiate improved generalization to diverse, persona-consistent dialogues.

Technology Category

Application Category

📝 Abstract

Persona-based dialogue generation is an important milestone towards building conversational artificial intelligence. Despite the ever-improving capabilities of large language models (LLMs), effectively integrating persona fidelity in conversations remains challenging due to the limited diversity in existing dialogue data. We propose a novel framework SBS (Score-Before-Speaking), which outperforms previous methods and yields improvements for both million and billion-parameter models. Unlike previous methods, SBS unifies the learning of responses and their relative quality into a single step. The key innovation is to train a dialogue model to correlate augmented responses with a quality score during training and then leverage this knowledge at inference. We use noun-based substitution for augmentation and semantic similarity-based scores as a proxy for response quality. Through extensive experiments with benchmark datasets (PERSONA-CHAT and ConvAI2), we show that score-conditioned training allows existing models to better capture a spectrum of persona-consistent dialogues. Our ablation studies also demonstrate that including scores in the input prompt during training is superior to conventional training setups. Code and further details are available at https://arpita2512.github.io/score_before_you_speak

Problem

Research questions and friction points this paper is trying to address.

Improving persona consistency in dialogue generation

Enhancing persona fidelity in conversational AI models

Unifying response learning and quality scoring in training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies response learning and quality scoring

Uses noun substitution for response augmentation

Leverages semantic similarity as quality proxy

🔎 Similar Papers

No similar papers found.