Vivar: A Generative AR System for Intuitive Multi-Modal Sensor Data Presentation

📅 2024-12-18

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of interpreting multimodal sensor data for non-experts—stemming from its inherent complexity, cross-modal semantic gap, and dynamic time-varying nature—by proposing the first generative augmented reality (AR) system tailored for domain-agnostic users. Methodologically: (1) it introduces centroid interpolation-based cross-modal embedding, mapping raw sensor data into a pretrained visual latent space; (2) it designs an end-to-end AR scene generation framework requiring no domain-specific knowledge; and (3) it incorporates latent variable reuse and caching to enhance efficiency. Technically, the system integrates cross-modal embedding, foundation model–driven generation, and 3D Gaussian Splatting (3DGS), achieving an 11× reduction in inference latency without compromising visual fidelity. A user study with 485 participants demonstrates statistically significant improvements over baselines in explanation accuracy, cross-scenario consistency, and real-world applicability.

Technology Category

Application Category

📝 Abstract

Understanding sensor data can be challenging for non-experts because of the complexity and unique semantic meanings of sensor modalities. This calls for intuitive and effective methods to present sensor information. However, creating intuitive sensor data visualizations presents three key challenges: the variability of sensor readings, gaps in domain comprehension, and the dynamic nature of sensor data. To address these issues, we develop Vivar, a novel AR system that integrates multi-modal sensor data and presents 3D volumetric content for visualization. In particular, we introduce a cross-modal embedding approach that maps sensor data into a pre-trained visual embedding space through barycentric interpolation. This allows for accurate and continuous integration of multi-modal sensor information. Vivar also incorporates sensor-aware AR scene generation using foundation models and 3D Gaussian Splatting (3DGS) without requiring domain expertise. In addition, Vivar leverages latent reuse and caching strategies to accelerate 2D and AR content generation. Our extensive experiments demonstrate that our system achieves 11$ imes$ latency reduction without compromising quality. A user study involving over 485 participants, including domain experts, demonstrates Vivar's effectiveness in accuracy, consistency, and real-world applicability, paving the way for more intuitive sensor data visualization.

Problem

Research questions and friction points this paper is trying to address.

Addresses difficulty in understanding multi-modal sensor data for non-experts

Overcomes challenges in visualizing dynamic and variable sensor readings

Enhances AR-based sensor data presentation without domain expertise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal embedding maps sensor data visually

Sensor-aware AR generation with foundation models

Latent reuse accelerates content generation significantly

🔎 Similar Papers

No similar papers found.

Nvidia

The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

US, CA, Santa Clara

Research Scientist Intern, Applied Perception Science (PhD)