SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality

šŸ“… 2025-08-24
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Current in-vehicle AR systems struggle with dynamic cabin–road scene separation, lack environment-adaptive spatial alignment and perception-consistent rendering, and feature neither LLM-driven context-aware recommendation nor a real-world driving SLAM evaluation benchmark. This paper proposes the first semantic dynamic cabin–road separation framework, integrating depth-guided vision–language grounding for cross-modal alignment. We design a dual-branch context-aware SLAM architecture enabling robust 6DoF tracking. We introduce EgoSLAM-Drive—the first real-world, first-person in-vehicle AR dataset—alongside the first GPT-driven AR content recommendation module for driving scenarios. Experiments demonstrate significant improvements in spatial alignment accuracy, AR rendering consistency, user scene comprehension, prompt relevance, and driving comfort across diverse driving conditions.

Technology Category

Application Category

šŸ“ Abstract
We present SEER-VAR, a novel framework for egocentric vehicle-based augmented reality (AR) that unifies semantic decomposition, Context-Aware SLAM Branches (CASB), and LLM-driven recommendation. Unlike existing systems that assume static or single-view settings, SEER-VAR dynamically separates cabin and road scenes via depth-guided vision-language grounding. Two SLAM branches track egocentric motion in each context, while a GPT-based module generates context-aware overlays such as dashboard cues and hazard alerts. To support evaluation, we introduce EgoSLAM-Drive, a real-world dataset featuring synchronized egocentric views, 6DoF ground-truth poses, and AR annotations across diverse driving scenarios. Experiments demonstrate that SEER-VAR achieves robust spatial alignment and perceptually coherent AR rendering across varied environments. As one of the first to explore LLM-based AR recommendation in egocentric driving, we address the lack of comparable systems through structured prompting and detailed user studies. Results show that SEER-VAR enhances perceived scene understanding, overlay relevance, and driver ease, providing an effective foundation for future research in this direction. Code and dataset will be made open source.
Problem

Research questions and friction points this paper is trying to address.

Dynamic separation of cabin and road scenes
Robust spatial alignment for AR rendering
LLM-driven context-aware overlay generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Depth-guided semantic decomposition for scene separation
Dual SLAM branches for contextual motion tracking
GPT-based module for context-aware AR overlays
šŸ”Ž Similar Papers
No similar papers found.
Y
Yuzhi Lai
Eberhard-Karls-Universität Tübingen
S
Shenghai Yuan
Nanyang Technological University
P
Peizheng Li
Eberhard-Karls-Universität Tübingen, Mercedes-Benz AG
Jun Lou
Jun Lou
Mercedes-Benz AG
Andreas Zell
Andreas Zell
Professor für Informatik, Universität Tübingen
RobotikBioinformatikMaschinelles LernenKünstliche IntelligenzBildverarbeitung