🤖 AI Summary
To address insufficient immersion, heavy reliance on training data, and excessive computational resources in large language model (LLM) role-playing, this paper proposes the Test-Time-Matching (TTM) framework. TTM operates entirely at inference time via context engineering—enabling fine-grained, parameter-free role disentanglement and matching without any model fine-tuning. It explicitly decomposes role characteristics into three orthogonal dimensions: *personality*, *memory*, and *linguistic style*, supporting flexible cross-role composition and dynamic substitution. To our knowledge, TTM is the first method to achieve fully automatic, zero-training disentanglement and controllable recombination of these three dimensions, leveraging a three-stage generative pipeline and a context-driven, test-time matching mechanism. Human evaluation demonstrates that TTM significantly outperforms existing zero-shot role-playing approaches in dialogue expressiveness, stylistic consistency, and role fidelity.
📝 Abstract
The rapid advancement of large language models (LLMs) has enabled role-playing language agents to demonstrate significant potential in various applications. However, relying solely on prompts and contextual inputs often proves insufficient for achieving deep immersion in specific roles, particularly well-known fictional or public figures. On the other hand, fine-tuning-based approaches face limitations due to the challenges associated with data collection and the computational resources required for training, thereby restricting their broader applicability. To address these issues, we propose Test-Time-Matching (TTM), a training-free role-playing framework through test-time scaling and context engineering. TTM uses LLM agents to automatically decouple a character's features into personality, memory, and linguistic style. Our framework involves a structured, three-stage generation pipeline that utilizes these features for controlled role-playing. It achieves high-fidelity role-playing performance, also enables seamless combinations across diverse linguistic styles and even variations in personality and memory. We evaluate our framework through human assessment, and the results demonstrate that our method achieves the outstanding performance in generating expressive and stylistically consistent character dialogues.