Seeing My Future: Predicting Situated Interaction Behavior in Virtual Reality

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge of human intent understanding and future behavior prediction in VR/AR systems. We propose a hierarchical, intent-aware dynamic graph convolutional network (GCN) framework that uniquely integrates cognition-driven intent modeling into dynamic GCNs. The method jointly learns high-level user motivations and fine-grained actions—such as gaze direction and object interaction—while incorporating historical human pose sequences and scene context to model the spatiotemporal evolution of human–environment interactions. Evaluated on real-world benchmark datasets and an in-the-loop real-time VR environment, our approach achieves significant improvements over state-of-the-art methods in prediction accuracy, temporal consistency, and cross-scenario generalizability. It establishes a novel paradigm for building proactive, adaptive intelligent VR/AR systems capable of anticipating user behavior.

Technology Category

Application Category

📝 Abstract
Virtual and augmented reality systems increasingly demand intelligent adaptation to user behaviors for enhanced interaction experiences. Achieving this requires accurately understanding human intentions and predicting future situated behaviors - such as gaze direction and object interactions - which is vital for creating responsive VR/AR environments and applications like personalized assistants. However, accurate behavioral prediction demands modeling the underlying cognitive processes that drive human-environment interactions. In this work, we introduce a hierarchical, intention-aware framework that models human intentions and predicts detailed situated behaviors by leveraging cognitive mechanisms. Given historical human dynamics and the observation of scene contexts, our framework first identifies potential interaction targets and forecasts fine-grained future behaviors. We propose a dynamic Graph Convolutional Network (GCN) to effectively capture human-environment relationships. Extensive experiments on challenging real-world benchmarks and live VR environment demonstrate the effectiveness of our approach, achieving superior performance across all metrics and enabling practical applications for proactive VR systems that anticipate user behaviors and adapt virtual environments accordingly.
Problem

Research questions and friction points this paper is trying to address.

Predicting human intentions and situated behaviors in VR
Modeling cognitive processes driving human-environment interactions
Developing proactive VR systems that anticipate user actions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical intention-aware framework models cognitive mechanisms
Dynamic Graph Convolutional Network captures human-environment relationships
Predicts fine-grained situated behaviors from historical dynamics
🔎 Similar Papers
No similar papers found.
Y
Yuan Xu
Peking University
Z
Zimu Zhang
Peking University
Xiaoxuan Ma
Xiaoxuan Ma
Peking University
Computer VisionDigital HumansAI for Science
W
Wentao Zhu
Eastern Institute of Technology, Ningbo
Y
Yu Qiao
Shanghai Jiao Tong University
Y
Yizhou Wang
Peking University