Fixed-Persona SLMs with Modular Memory: Scalable NPC Dialogue on Consumer Hardware

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address the deployment challenges of large language models (LLMs) in game NPC dialogue—namely high computational overhead, latency, and ill-defined knowledge boundaries—this paper proposes a lightweight, modular dialogue system. Our method introduces a persona-driven memory architecture that decouples static role definitions from dynamically swappable memory modules, enabling runtime memory updates and persistent context retention without model retraining or reloading. Leveraging compact models—including DistilGPT-2, TinyLlama-1.1B-Chat, and Mistral-7B-Instruct—we employ synthetic data fine-tuning and modular memory management to achieve low-latency inference, long-horizon memory preservation, and strong persona consistency on consumer-grade hardware. Experimental results demonstrate efficacy and scalability under resource constraints, while also highlighting transfer potential to interactive AI applications such as virtual assistants.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, yet their applicability to dialogue systems in computer games remains limited. This limitation arises from their substantial hardware requirements, latency constraints, and the necessity to maintain clearly defined knowledge boundaries within a game setting. In this paper, we propose a modular NPC dialogue system that leverages Small Language Models (SLMs), fine-tuned to encode specific NPC personas and integrated with runtime-swappable memory modules. These memory modules preserve character-specific conversational context and world knowledge, enabling expressive interactions and long-term memory without retraining or model reloading during gameplay. We comprehensively evaluate our system using three open-source SLMs: DistilGPT-2, TinyLlama-1.1B-Chat, and Mistral-7B-Instruct, trained on synthetic persona-aligned data and benchmarked on consumer-grade hardware. While our approach is motivated by applications in gaming, its modular design and persona-driven memory architecture hold significant potential for broader adoption in domains requiring expressive, scalable, and memory-rich conversational agents, such as virtual assistants, customer support bots, or interactive educational systems.

Problem

Research questions and friction points this paper is trying to address.

Develop scalable NPC dialogue systems using small language models

Address hardware limitations and latency in game dialogue generation

Maintain character knowledge boundaries with modular memory architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular memory enables runtime-swappable character context

Small Language Models fine-tuned for specific NPC personas

Consumer hardware runs scalable NPC dialogue systems

🔎 Similar Papers

Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework