🤖 AI Summary
This work proposes the first generative augmented reality (AR) framework that supports on-demand caching and selective correction to address the challenges of efficiency, consistency, and safety in real-time future-frame generation. By integrating a diffusion-based world model with a region-selective editing mechanism, the method generates semantically plausible future AR frames while ensuring alignment with real-world observations through real-time perceptual anchoring in critical regions. Evaluated in driving scenarios, the system demonstrates high rendering efficiency alongside precise preservation of semantic structure and reliable safety-aware corrections, thereby validating its practicality and robustness in complex, dynamic environments.
📝 Abstract
Generative world models offer a compelling foundation for augmented-reality (AR) applications: by predicting future image sequences that incorporate deliberate visual edits, they enable temporally coherent, augmented future frames that can be computed ahead of time and cached, avoiding per-frame rendering from scratch in real time. In this work, we present SEGAR, a preliminary framework that combines a diffusion-based world model with a selective correction stage to support this vision. The world model generates augmented future frames with region-specific edits while preserving others, and the correction stage subsequently aligns safety-critical regions with real-world observations while preserving intended augmentations elsewhere. We demonstrate this pipeline in driving scenarios as a representative setting where semantic region structure is well defined and real-world feedback is readily available. We view this as an early step toward generative world models as practical AR infrastructure, where future frames can be generated, cached, and selectively corrected on demand.