Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

While expert role prompting can enhance the alignment of large language models (LLMs) in generative tasks, it often degrades performance on discriminative tasks and lacks a unified mechanism to balance both objectives. To address this, this work proposes PRISM, a novel framework that, for the first time, enables intent-conditioned self-distillation of expert roles using only the model’s own generated data, coupled with lightweight gated LoRA adapters for dynamic routing. Requiring no external data, PRISM significantly improves alignment with human preferences and safety in generative tasks across diverse LLMs while preserving discriminative accuracy, all with minimal memory and computational overhead.

Technology Category

Application Category

📝 Abstract

Persona prompting can steer LLM generation towards a domain-specific tone and pattern. This behavior enables use cases in multi-agent systems where diverse interactions are crucial and human-centered tasks require high-level human alignment. Prior works provide mixed opinions on their utility: some report performance gains when using expert personas for certain domains and their contribution to data diversity in synthetic data creation, while others find near-zero or negative impact on general utility. To fully leverage the benefits of the LLM persona and avoid its harmfulness, a more comprehensive investigation of the mechanism is crucial. In this work, we study how model optimization, task type, prompt length, and placement can impact expert persona effectiveness across instruction-tuned and reasoning LLMs, and provide insight into conditions under which expert personas fail and succeed. Based on our findings, we developed a pipeline to fully leverage the benefits of an expert persona, named PRISM (Persona Routing via Intent-based Self-Modeling), which self-distills an intent-conditioned expert persona into a gated LoRA adapter through a bootstrapping process that requires no external data, models, or knowledge. PRISM enhances human preference and safety alignment on generative tasks while maintaining accuracy on discriminative tasks across all models, with minimal memory and computing overhead.

Problem

Research questions and friction points this paper is trying to address.

expert personas

LLM alignment

accuracy degradation

persona prompting

human preference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert Personas

Intent-Based Routing

Self-Distillation