🤖 AI Summary
Existing spatial media treat content as static assets, lacking semantic coupling with physical environments and support for deep interaction. This paper proposes MAR-ED, the first systematic framework for structuring mixed and augmented reality experiences, grounded in three primitives: “event—keyframe—playback.” It enables semantically grounded, interactive replay of past events within the current physical space. Methodologically, MAR-ED integrates semantic scene graph modeling, adaptive keyframe extraction, and user-driven dynamic playback, supporting real-time spatiotemporal adaptation and multi-user collaborative narrative reconstruction. Compared to conventional video playback, MAR-ED elevates digital memory into an environment-aware, interactive, and reconfigurable immersive experience. Experimental evaluation demonstrates its efficacy in training, cultural heritage activation, and interactive storytelling—highlighting significant practical potential across these domains.
📝 Abstract
We propose the Spatio-Temporal Mixed and Augmented Reality Experience Description (MAR-ED), a novel framework to standardize the representation of past events for interactive and adaptive playback in a user's present physical space. While current spatial media technologies have primarily focused on capturing or replaying content as static assets, often disconnected from the viewer's environment or offering limited interactivity, the means to describe an experience's underlying semantic and interactive structure remains underexplored. We propose a descriptive framework called MAR-ED based on three core primitives: 1) Event Primitives for semantic scene graph representation, 2) Keyframe Primitives for efficient and meaningful data access, and 3) Playback Primitives for user-driven adaptive interactive playback of recorded MAR experience. The proposed flowchart of the three-stage process of the proposed MAR-ED framework transforms a recorded experience into a unique adaptive MAR experience during playback, where its spatio-temporal structure dynamically conforms to a new environment and its narrative can be altered by live user input. Drawing on this framework, personal digital memories and recorded events can evolve beyond passive 2D/3D videos into immersive, spatially-integrated group experiences, opening new paradigms for training, cultural heritage, and interactive storytelling without requiring complex, per-user adaptive rendering.