CleAR: Robust Context-Guided Generative Lighting Estimation for Mobile Augmented Reality

📅 2024-11-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address the problem of discontinuous environmental illumination estimation in mobile augmented reality (AR) caused by narrow camera field-of-view and low dynamic range, this paper proposes the first context-guided generative illumination estimation framework designed for on-device deployment. Methodologically, it fuses real-time visual-inertial odometry (VIO) and RGB sensing data to drive a two-stage conditional diffusion model that synthesizes 360° high-dynamic-range (HDR) environment maps; a lightweight on-device fine-tuning module is further introduced to jointly ensure physical plausibility and generation diversity. Key contributions include: (1) pioneering a context-guided generative paradigm for AR illumination estimation; (2) achieving high-quality HDR illumination estimation on mobile devices at 3.2 seconds per frame—110× faster and 51–56% more accurate than state-of-the-art methods; and (3) demonstrating significant user preference (N=31) in material generalization, low-light robustness, and photorealistic rendering of virtual objects.

Technology Category

Application Category

📝 Abstract

High-quality environment lighting is essential for creating immersive mobile augmented reality (AR) experiences. However, achieving visually coherent estimation for mobile AR is challenging due to several key limitations in AR device sensing capabilities, including low camera FoV and limited pixel dynamic ranges. Recent advancements in generative AI, which can generate high-quality images from different types of prompts, including texts and images, present a potential solution for high-quality lighting estimation. Still, to effectively use generative image diffusion models, we must address two key limitations of content quality and slow inference. In this work, we design and implement a generative lighting estimation system called CleAR that can produce high-quality, diverse environment maps in the format of 360{deg} HDR images. Specifically, we design a two-step generation pipeline guided by AR environment context data to ensure the output aligns with the physical environment's visual context and color appearance. To improve the estimation robustness under different lighting conditions, we design a real-time refinement component to adjust lighting estimation results on AR devices. Through a combination of quantitative and qualitative evaluations, we show that CleAR outperforms state-of-the-art lighting estimation methods on both estimation accuracy, latency, and robustness, and is rated by 31 participants as producing better renderings for most virtual objects. For example, CleAR achieves 51% to 56% accuracy improvement on virtual object renderings across objects of three distinctive types of materials and reflective properties. CleAR produces lighting estimates of comparable or better quality in just 3.2 seconds -- over 110X faster than state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Estimating high-quality lighting for mobile AR with limited device sensing

Improving generative AI content quality and speed for lighting estimation

Ensuring lighting estimation aligns with physical environment context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step generative pipeline with context guidance

Real-time refinement for lighting robustness

Fast 360° HDR image generation in 3.2 seconds

🔎 Similar Papers

Augmented Reality without Borders: Achieving Precise Localization Without Maps