Toward Safe, Trustworthy and Realistic Augmented Reality User Experience

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Malicious virtual content in augmented reality (AR) poses critical security risks—including occlusion of safety-critical information and covert manipulation of user perception—threatening user trust and safety. Method: This paper proposes ViDDAR and VIM-Sense, a dual-system collaborative defense framework that integrates vision-language models (VLMs) with multimodal reasoning modules to establish a perception-aligned content quality assessment mechanism; it further introduces a lightweight model adaptation strategy for real-time deployment on resource-constrained AR devices. Contribution/Results: To our knowledge, this is the first work to jointly leverage multimodal semantic understanding and human perceptual modeling for fine-grained detection of both occlusion-based and manipulation-based attacks. Extensive experiments demonstrate high robustness and scalability in realistic AR scenarios. The framework establishes the first end-to-end, human-centered technical pathway and research paradigm for trustworthy AR content governance.

Technology Category

Application Category

📝 Abstract
As augmented reality (AR) becomes increasingly integrated into everyday life, ensuring the safety and trustworthiness of its virtual content is critical. Our research addresses the risks of task-detrimental AR content, particularly that which obstructs critical information or subtly manipulates user perception. We developed two systems, ViDDAR and VIM-Sense, to detect such attacks using vision-language models (VLMs) and multimodal reasoning modules. Building on this foundation, we propose three future directions: automated, perceptually aligned quality assessment of virtual content; detection of multimodal attacks; and adaptation of VLMs for efficient and user-centered deployment on AR devices. Overall, our work aims to establish a scalable, human-aligned framework for safeguarding AR experiences and seeks feedback on perceptual modeling, multimodal AR content implementation, and lightweight model adaptation.
Problem

Research questions and friction points this paper is trying to address.

Ensuring safety and trustworthiness of AR virtual content
Detecting task-detrimental AR content obstructing critical information
Developing scalable human-aligned frameworks for AR experiences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language models detect AR content risks
Multimodal reasoning enhances attack detection
Lightweight VLM adaptation for AR devices
🔎 Similar Papers
No similar papers found.