π€ AI Summary
Addressing the challenges of modeling persona consistency and insufficient data diversity in dialogue generation, this paper proposes Score-Based Self-training (SBS), a novel framework that explicitly incorporates response quality scores into end-to-end training of generative models. Leveraging semantic similarity as a proxy metric, SBS performs controllable data augmentation via noun substitution and conditions generation on score signals embedded in input prompts. This unified approach jointly optimizes persona consistency and response quality. Empirical evaluation demonstrates consistent performance gains across model scalesβfrom millions to billions of parameters. On PERSONA-CHAT and ConvAI2 benchmarks, SBS achieves new state-of-the-art results. Ablation studies confirm the critical role of score signals in enhancing consistency modeling and substantiate improved generalization to diverse, persona-consistent dialogues.
π Abstract
Persona-based dialogue generation is an important milestone towards building conversational artificial intelligence. Despite the ever-improving capabilities of large language models (LLMs), effectively integrating persona fidelity in conversations remains challenging due to the limited diversity in existing dialogue data. We propose a novel framework SBS (Score-Before-Speaking), which outperforms previous methods and yields improvements for both million and billion-parameter models. Unlike previous methods, SBS unifies the learning of responses and their relative quality into a single step. The key innovation is to train a dialogue model to correlate augmented responses with a quality score during training and then leverage this knowledge at inference. We use noun-based substitution for augmentation and semantic similarity-based scores as a proxy for response quality. Through extensive experiments with benchmark datasets (PERSONA-CHAT and ConvAI2), we show that score-conditioned training allows existing models to better capture a spectrum of persona-consistent dialogues. Our ablation studies also demonstrate that including scores in the input prompt during training is superior to conventional training setups. Code and further details are available at https://arpita2512.github.io/score_before_you_speak