🤖 AI Summary
This paper addresses the limitations in photorealism and interactivity inherent in traditional augmented reality (AR), which relies on multi-stage modular synthesis. We propose Generative Augmented Reality (GAR), a novel paradigm that redefines AR as “world resynthesis” rather than “world composition.” GAR employs a unified generative backbone network to jointly encode environmental perception, virtual content, and interaction signals in an end-to-end manner, enabling continuous, video-level augmented output. Its core innovation lies in establishing a computational mapping between AR and generative AI—supporting conditional real-time generation and unified inference. We formalize a theoretical framework for GAR, empirically validate its technical feasibility, and outline forward-looking application pathways targeting high-fidelity rendering, natural interaction, and deep immersion. This work establishes GAR as a foundational paradigm and technical blueprint for next-generation AR systems.
📝 Abstract
This paper introduces Generative Augmented Reality (GAR) as a next-generation paradigm that reframes augmentation as a process of world re-synthesis rather than world composition by a conventional AR engine. GAR replaces the conventional AR engine's multi-stage modules with a unified generative backbone, where environmental sensing, virtual content, and interaction signals are jointly encoded as conditioning inputs for continuous video generation. We formalize the computational correspondence between AR and GAR, survey the technical foundations that make real-time generative augmentation feasible, and outline prospective applications that leverage its unified inference model. We envision GAR as a future AR paradigm that delivers high-fidelity experiences in terms of realism, interactivity, and immersion, while eliciting new research challenges on technologies, content ecosystems, and the ethical and societal implications.