OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas

📅 2025-01-26

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the weak generalization capability and high customization cost of large language models (LLMs) in role-playing tasks. To this end, we propose an efficient role-generalization modeling framework. First, we construct a synthetic persona dataset comprising over 10,000 diverse roles, leveraging a novel Persona Hub–driven persona profiling approach. Second, we introduce a dual-path data construction paradigm—role-aware response generation and rewriting—to enable multi-stage role alignment via data distillation. Finally, we perform supervised fine-tuning on LLaMA-3 8B. Experiments demonstrate substantial improvements in zero-shot role transfer, with role-playing dialogue quality approaching that of GPT-4o. We publicly release the entire synthetic persona dataset and instruction-following dialogue corpus, establishing critical infrastructure and a methodological foundation for open research in role-aware AI.

Technology Category

Application Category

📝 Abstract

Customizable role-playing in large language models (LLMs), also known as character generalization, is gaining increasing attention for its versatility and cost-efficiency in developing and deploying role-playing dialogue agents. This study explores a large-scale data synthesis approach to equip LLMs with character generalization capabilities. We begin by synthesizing large-scale character profiles using personas from Persona Hub and then explore two strategies: response rewriting and response generation, to create character-aligned instructional responses. To validate the effectiveness of our synthetic instruction tuning data for character generalization, we perform supervised fine-tuning (SFT) using the LLaMA-3 8B model. Our best-performing model strengthens the original LLaMA-3 8B Instruct model and achieves performance comparable to GPT-4o models on role-playing dialogue. We release our synthetic characters and instruction-tuning dialogues to support public research.

Problem

Research questions and friction points this paper is trying to address.

Role Generalization

Language Models

Training Methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Role Generalization

Customizable Dialogue Data

Persona-based Fine-tuning

🔎 Similar Papers

Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework

2024-06-25Citations: 0