Tricking LLM-Based NPCs into Spilling Secrets

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work identifies, for the first time, a critical security vulnerability—covert background information leakage—in large language model (LLM)-driven non-player characters (NPCs) within fictional narratives. Addressing whether adversarial prompt injection induces NPCs to disclose confidential world-building details, we propose a novel attack framework integrating role-playing, context manipulation, and targeted prompt steering, alongside a standardized evaluation benchmark. Empirical evaluation across diverse open- and closed-weight LLMs reveals substantial leakage of hidden narrative elements under multiple attack strategies, with peak disclosure rates reaching 89%. The study uncovers fundamental design flaws in current NPC architectures—including absence of privacy boundary modeling and insufficient role-consistency constraints—and establishes a reproducible benchmark for assessing safety alignment of LLMs in interactive storytelling. Furthermore, it provides actionable insights for developing robust defensive mechanisms against unauthorized information extraction in narrative AI systems.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used to generate dynamic dialogue for game NPCs. However, their integration raises new security concerns. In this study, we examine whether adversarial prompt injection can cause LLM-based NPCs to reveal hidden background secrets that are meant to remain undisclosed.

Problem

Research questions and friction points this paper is trying to address.

Adversarial prompts extract hidden secrets from LLM-based NPCs

Security risks in game NPCs using dynamic dialogue generation

Testing if prompt injection reveals undisclosed background information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial prompt injection technique

Testing LLM-based NPCs security

Extracting hidden background secrets

🔎 Similar Papers

No similar papers found.