PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the trade-off between role consistency and expressive authenticity in large language models, where enhancing prompt robustness often compromises the fidelity of character portrayal. To mitigate this tension without incurring additional inference overhead, the authors propose PerMix-RLVR, a method that jointly optimizes role stability and expression fidelity through role-mixed training and reinforcement learning with verifiable rewards (RLVR). Experimental results demonstrate that PerMix-RLVR improves role stability by 21.2% on MATH500 and enhances role fidelity by 11.4% on PersonaGym, all while preserving downstream task performance.

Technology Category

Application Category

📝 Abstract

Persona prompting has been widely adopted to steer large language models (LLMs) behavior and improve their instruction performance by assigning specific characters. However, identifying an optimal persona is time-consuming, and its impact on output quality remains poorly understood. Prior work has mainly addressed this issue at the prompt level via inference-time strategies, incurring additional computation. In this work, we avoid inference-time prompt search by tackling persona sensitivity during training, aiming to train models that adapt their behavior to diverse personas while preserving task performance. In particular, we find that reinforcement learning with verifiable rewards (RLVR) systematically reduces sensitivity to persona prompts, but also reveals an inherent trade-off of outcome-based optimization: while RLVR improves robustness on tasks with verifiable goals, it can also degrade persona expressivity when needed, e.g., in-character role-playing. To address this limitation, we propose PerMix-RLVR, a persona-mixed RLVR strategy that mitigates the persona robustness-fidelity trade-off, preserving strong robustness to harmful persona variation while enabling faithful persona adoption when required. Concretely, PerMix-RLVR improves persona stability score (PSS) over RLVR by +21.2% on MATH500, while also enhancing persona fidelity by +11.4% on PersonaGym.

Problem

Research questions and friction points this paper is trying to address.

persona prompting

reinforcement learning

verifiable rewards

persona expressivity

robustness-fidelity trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

PerMix-RLVR

persona expressivity

reinforcement learning with verifiable rewards