Score Before You Speak: Improving Persona Consistency in Dialogue Generation using Response Quality Scores

πŸ“… 2025-08-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the challenges of modeling persona consistency and insufficient data diversity in dialogue generation, this paper proposes Score-Based Self-training (SBS), a novel framework that explicitly incorporates response quality scores into end-to-end training of generative models. Leveraging semantic similarity as a proxy metric, SBS performs controllable data augmentation via noun substitution and conditions generation on score signals embedded in input prompts. This unified approach jointly optimizes persona consistency and response quality. Empirical evaluation demonstrates consistent performance gains across model scalesβ€”from millions to billions of parameters. On PERSONA-CHAT and ConvAI2 benchmarks, SBS achieves new state-of-the-art results. Ablation studies confirm the critical role of score signals in enhancing consistency modeling and substantiate improved generalization to diverse, persona-consistent dialogues.

Technology Category

Application Category

πŸ“ Abstract
Persona-based dialogue generation is an important milestone towards building conversational artificial intelligence. Despite the ever-improving capabilities of large language models (LLMs), effectively integrating persona fidelity in conversations remains challenging due to the limited diversity in existing dialogue data. We propose a novel framework SBS (Score-Before-Speaking), which outperforms previous methods and yields improvements for both million and billion-parameter models. Unlike previous methods, SBS unifies the learning of responses and their relative quality into a single step. The key innovation is to train a dialogue model to correlate augmented responses with a quality score during training and then leverage this knowledge at inference. We use noun-based substitution for augmentation and semantic similarity-based scores as a proxy for response quality. Through extensive experiments with benchmark datasets (PERSONA-CHAT and ConvAI2), we show that score-conditioned training allows existing models to better capture a spectrum of persona-consistent dialogues. Our ablation studies also demonstrate that including scores in the input prompt during training is superior to conventional training setups. Code and further details are available at https://arpita2512.github.io/score_before_you_speak
Problem

Research questions and friction points this paper is trying to address.

Improving persona consistency in dialogue generation
Enhancing persona fidelity in conversational AI models
Unifying response learning and quality scoring in training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies response learning and quality scoring
Uses noun substitution for response augmentation
Leverages semantic similarity as quality proxy
πŸ”Ž Similar Papers
No similar papers found.