π€ AI Summary
This work addresses the challenge of enabling users to rapidly construct virtual scenes via drag-and-drop interactions with agents and objects, while automatically generating executable, semantically coherent behavioral narratives. Methodologically, the system injects semantic metadata into the scene and serializes it into natural-language prompts for lightweight large language models (LLMs) such as Phi-3 and Gemma-2B; a custom parser then maps the LLMβs textual output to structured action sequences that drive real-time behavior, animation, and interaction. The key contribution is the first use of *unfine-tuned* lightweight LLMs as end-to-end βnarrative behavior compilers,β relying solely on prompt engineering and deterministic parsing to bridge high-level semantics and low-level execution. Experiments demonstrate feasibility and practicality: for moderately complex scenes, average response latency is under 1.2 seconds, and action parsing accuracy exceeds 94%.
π Abstract
This paper presents a system for procedurally generating agent-based narratives using large language models (LLMs). Users could drag and drop multiple agents and objects into a scene, with each entity automatically assigned semantic metadata describing its identity, role, and potential interactions. The scene structure is then serialized into a natural language prompt and sent to an LLM, which returns a structured string describing a sequence of actions and interactions among agents and objects. The returned string encodes who performed which actions, when, and how. A custom parser interprets this string and triggers coordinated agent behaviors, animations, and interaction modules. The system supports agent-based scenes, dynamic object manipulation, and diverse interaction types. Designed for ease of use and rapid iteration, the system enables the generation of virtual agent activity suitable for prototyping agent narratives. The performance of the developed system was evaluated using four popular lightweight LLMs. Each model's process and response time were measured under multiple complexity scenarios. The collected data were analyzed to compare consistency across the examined scenarios and to highlight the relative efficiency and suitability of each model for procedural agent-based narratives generation. The results demonstrate that LLMs can reliably translate high-level scene descriptions into executable agent-based behaviors.