Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene

📅 2025-07-25

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the problem of generating dynamic, event-driven narratives and interactive motions for multiple characters in 3D scenes. We propose the first event-guided hierarchical generation framework: (1) a large language model decomposes textual narratives into spatiotemporally coherent, fine-grained event sequences; (2) a high-level spatial guidance module interprets inter-character relative positions and scene constraints; and (3) a motion synthesis module drives coordinated multi-character animation with precise spatial localization. Our contributions are threefold: (1) the first approach enabling contextualized, multi-agent behavioral generation under large-scale, diverse settings; (2) the first dedicated benchmark tailored to this task, covering narrative fidelity, spatial consistency, and interaction plausibility; and (3) comprehensive experiments and user studies demonstrating significant improvements in contextual coherence, interaction合理性, and scalability over prior methods.

Technology Category

Application Category

📝 Abstract

In this work, we propose a framework that creates a lively virtual dynamic scene with contextual motions of multiple humans. Generating multi-human contextual motion requires holistic reasoning over dynamic relationships among human-human and human-scene interactions. We adapt the power of a large language model (LLM) to digest the contextual complexity within textual input and convert the task into tangible subproblems such that we can generate multi-agent behavior beyond the scale that was not considered before. Specifically, our event generator formulates the temporal progression of a dynamic scene into a sequence of small events. Each event calls for a well-defined motion involving relevant characters and objects. Next, we synthesize the motions of characters at positions sampled based on spatial guidance. We employ a high-level module to deliver scalable yet comprehensive context, translating events into relative descriptions that enable the retrieval of precise coordinates. As the first to address this problem at scale and with diversity, we offer a benchmark to assess diverse aspects of contextual reasoning. Benchmark results and user studies show that our framework effectively captures scene context with high scalability. The code and benchmark, along with result videos, are available at our project page: https://rms0329.github.io/Event-Driven-Storytelling/.

Problem

Research questions and friction points this paper is trying to address.

Generating multi-human contextual motion in 3D scenes

Holistic reasoning for human-human and human-scene interactions

Scalable event-driven storytelling with diverse contextual motions

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven multi-human contextual motion generation

Event-based temporal scene progression formulation

Scalable high-level context translation for motion

🔎 Similar Papers

No similar papers found.