Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AR narrative systems rely on static object labels and coordinates, neglecting semantic relationships among objects (e.g., a wedding ring on a nightstand symbolizing marital conflict), resulting in rigid storytelling, superficial spatial utilization, and misalignment with AR Foundation. This paper proposes an object-driven narrative framework for AR, treating the physical environment as an active narrative agent. Leveraging Vision-Language Models (VLMs), it performs state-aware three-layer semantic parsing—physical, functional, and metaphorical—and establishes a bidirectional JSON narrative interface. A novel STAM evaluation framework (Spatial, Temporal, Analogical-Metaphorical) supports joint spatial-semantic anchoring and metaphorical reasoning. The framework eliminates reliance on predefined scripts, enabling dynamic mapping from environmental symbols to narrative anchors. A user study shows that 70% of participants significantly altered their perception of real-world objects due to symbolically grounded narrative delivery.

Technology Category

Application Category

📝 Abstract
Most adaptive AR storytelling systems define environmental semantics using simple object labels and spatial coordinates, limiting narratives to rigid, pre-defined logic. This oversimplification overlooks the contextual significance of object relationships-for example, a wedding ring on a nightstand might suggest marital conflict, yet is treated as just"two objects"in space. To address this, we explored integrating Vision Language Models (VLMs) into AR pipelines. However, several challenges emerged: First, stories generated with simple prompt guidance lacked narrative depth and spatial usage. Second, spatial semantics were underutilized, failing to support meaningful storytelling. Third, pre-generated scripts struggled to align with AR Foundation's object naming and coordinate systems. We propose a scene-driven AR storytelling framework that reimagines environments as active narrative agents, built on three innovations: 1. State-aware object semantics: We decompose object meaning into physical, functional, and metaphorical layers, allowing VLMs to distinguish subtle narrative cues between similar objects. 2. Structured narrative interface: A bidirectional JSON layer maps VLM-generated metaphors to AR anchors, maintaining spatial and semantic coherence. 3. STAM evaluation framework: A three-part experimental design evaluates narrative quality, highlighting both strengths and limitations of VLM-AR integration. Our findings show that the system can generate stories from the environment itself, not just place them on top of it. In user studies, 70% of participants reported seeing real-world objects differently when narratives were grounded in environmental symbolism. By merging VLMs' generative creativity with AR's spatial precision, this framework introduces a novel object-driven storytelling paradigm, transforming passive spaces into active narrative landscapes.
Problem

Research questions and friction points this paper is trying to address.

Overcoming rigid AR storytelling with contextual object relationships
Enhancing narrative depth in AR using Vision Language Models
Aligning VLM-generated stories with AR spatial semantics
Innovation

Methods, ideas, or system contributions that make the work stand out.

State-aware object semantics for narrative cues
Bidirectional JSON layer for semantic coherence
STAM framework evaluating narrative quality
🔎 Similar Papers
No similar papers found.
Y
Yusi Sun
the University of Hong Kong, Hong Kong, China
Haoyan Guan
Haoyan Guan
The University of Hong Kong
vision language modelfew shot learning
L
leith Kin Yep Chan
the University of Hong Kong, Hong Kong, China
Y
Yong Hong Kuo
the University of Hong Kong, Hong Kong, China