๐ค AI Summary
Existing crowd simulation methods largely neglect the influence of linguistic dialogue on social navigation and emergent behavior, resulting in interactions limited to simplistic orientation adjustments and predefined goalsโlacking realism. This paper proposes a language-driven multi-agent crowd simulation framework that integrates large language models (LLMs), persona-based dialogue systems, multimodal state inputs (visual, affective, physical), and context-aware navigation mechanisms. Agents generate natural, socially grounded dialogue conditioned on personality, emotional state, and environmental perception, which in turn dynamically guides navigation and enables spontaneous emergence of collective behaviors (e.g., clustering, dispersion). Experiments demonstrate significant improvements in self-organization, environmental adaptability, and socio-behavioral authenticity across complex scenarios. To our knowledge, this is the first approach to establish dialogue as the primary driver of crowd dynamics simulation.
๐ Abstract
Animating and simulating crowds using an agent-based approach is a well-established area where every agent in the crowd is individually controlled such that global human-like behaviour emerges. We observe that human navigation and movement in crowds are often influenced by complex social and environmental interactions, driven mainly by language and dialogue. However, most existing work does not consider these dimensions and leads to animations where agent-agent and agent-environment interactions are largely limited to steering and fixed higher-level goal extrapolation.
We propose a novel method that exploits large language models (LLMs) to control agents' movement. Our method has two main components: a dialogue system and language-driven navigation. We periodically query agent-centric LLMs conditioned on character personalities, roles, desires, and relationships to control the generation of inter-agent dialogue when necessitated by the spatial and social relationships with neighbouring agents. We then use the conversation and each agent's personality, emotional state, vision, and physical state to control the navigation and steering of each agent. Our model thus enables agents to make motion decisions based on both their perceptual inputs and the ongoing dialogue.
We validate our method in two complex scenarios that exemplify the interplay between social interactions, steering, and crowding. In these scenarios, we observe that grouping and ungrouping of agents automatically occur. Additionally, our experiments show that our method serves as an information-passing mechanism within the crowd. As a result, our framework produces more realistic crowd simulations, with emergent group behaviours arising naturally from any environmental setting.