🤖 AI Summary
To address the deployment challenges of large language models (LLMs) in game NPC dialogue—namely high computational overhead, latency, and ill-defined knowledge boundaries—this paper proposes a lightweight, modular dialogue system. Our method introduces a persona-driven memory architecture that decouples static role definitions from dynamically swappable memory modules, enabling runtime memory updates and persistent context retention without model retraining or reloading. Leveraging compact models—including DistilGPT-2, TinyLlama-1.1B-Chat, and Mistral-7B-Instruct—we employ synthetic data fine-tuning and modular memory management to achieve low-latency inference, long-horizon memory preservation, and strong persona consistency on consumer-grade hardware. Experimental results demonstrate efficacy and scalability under resource constraints, while also highlighting transfer potential to interactive AI applications such as virtual assistants.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, yet their applicability to dialogue systems in computer games remains limited. This limitation arises from their substantial hardware requirements, latency constraints, and the necessity to maintain clearly defined knowledge boundaries within a game setting. In this paper, we propose a modular NPC dialogue system that leverages Small Language Models (SLMs), fine-tuned to encode specific NPC personas and integrated with runtime-swappable memory modules. These memory modules preserve character-specific conversational context and world knowledge, enabling expressive interactions and long-term memory without retraining or model reloading during gameplay. We comprehensively evaluate our system using three open-source SLMs: DistilGPT-2, TinyLlama-1.1B-Chat, and Mistral-7B-Instruct, trained on synthetic persona-aligned data and benchmarked on consumer-grade hardware. While our approach is motivated by applications in gaming, its modular design and persona-driven memory architecture hold significant potential for broader adoption in domains requiring expressive, scalable, and memory-rich conversational agents, such as virtual assistants, customer support bots, or interactive educational systems.