🤖 AI Summary
This work proposes a novel method for automatically transforming 2D films into immersive mixed reality experiences. By leveraging multimodal large language models to interpret cinematic semantics and integrating generative AI to dynamically produce narrative-synchronized 3D augmentations, the system spatially embeds these enhancements into the user’s physical environment via the Meta Quest 3 headset. The approach pioneers the fusion of multimodal LLMs with 3D generative techniques, introducing five cinematic AR enhancement paradigms—particle effects, character-driven interactions, dynamic lighting, spatial audio, and environmental integration. Comprehensive evaluation through 100 video segments, a user study with 12 participants, and in-depth interviews with eight filmmakers demonstrates that the system significantly enhances viewer immersion and enjoyment, thereby expanding the application frontier of generative augmented reality in cinematic storytelling.
📝 Abstract
We introduce CinemaWorld, a generative augmented reality system that augments the viewer's physical surroundings with automatically generated mixed reality 3D content extracted from and synchronized with 2D movie scenes. Our system preprocesses films to extract key features using multimodal large language models (LLMs), generates dynamic 3D augmentations with generative AI, and embeds them spatially into the viewer's physical environment on the Meta Quest 3. To explore the design space of CinemaWorld, we conducted an elicitation study with eight film students, which led us to identify several key augmentation types, including particle effects, surrounding objects, textural overlays, character-driven augmentation, and lighting effects. We evaluated our system through a technical evaluation (N=100 video clips), a user study (N=12), and expert interviews with film creators (N=8). Results indicate that CinemaWorld enhances immersion and enjoyment, suggesting its potential to enrich the film-viewing experience.