From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models

๐Ÿ“… 2025-04-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In XR applications, sparse and frequently lost visual hand-tracking signals lead to discontinuous full-body motion generation and diminished immersion. Method: We propose the Rolling Prediction Model (RPM), the first lightweight online temporal framework enabling adaptive fusion of tracking and synthesis modalities, incorporating motion-prior constraints and a gated mode-switching mechanism. Contribution/Results: We introduce GORPโ€”the first real-world VR benchmark dataset for sparse inputs (14 hours, 28 subjects)โ€”filling a critical gap in realistic evaluation. On both GORP and synthetic benchmarks, RPM outperforms state-of-the-art methods with millisecond-level latency, high motion naturalness, and strong robustness to tracking dropouts. All code, pretrained models, and the GORP dataset are publicly released.

Technology Category

Application Category

๐Ÿ“ Abstract
In extended reality (XR), generating full-body motion of the users is important to understand their actions, drive their virtual avatars for social interaction, and convey a realistic sense of presence. While prior works focused on spatially sparse and always-on input signals from motion controllers, many XR applications opt for vision-based hand tracking for reduced user friction and better immersion. Compared to controllers, hand tracking signals are less accurate and can even be missing for an extended period of time. To handle such unreliable inputs, we present Rolling Prediction Model (RPM), an online and real-time approach that generates smooth full-body motion from temporally and spatially sparse input signals. Our model generates 1) accurate motion that matches the inputs (i.e., tracking mode) and 2) plausible motion when inputs are missing (i.e., synthesis mode). More importantly, RPM generates seamless transitions from tracking to synthesis, and vice versa. To demonstrate the practical importance of handling noisy and missing inputs, we present GORP, the first dataset of realistic sparse inputs from a commercial virtual reality (VR) headset with paired high quality body motion ground truth. GORP provides>14 hours of VR gameplay data from 28 people using motion controllers (spatially sparse) and hand tracking (spatially and temporally sparse). We benchmark RPM against the state of the art on both synthetic data and GORP to highlight how we can bridge the gap for real-world applications with a realistic dataset and by handling unreliable input signals. Our code, pretrained models, and GORP dataset are available in the project webpage.
Problem

Research questions and friction points this paper is trying to address.

Generating smooth full-body motion from sparse inputs in XR
Handling unreliable vision-based hand tracking signals in VR
Seamlessly transitioning between tracking and synthesis motion modes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time motion generation with RPM
Handles unreliable vision-based hand tracking
Seamless transition between tracking and synthesis
๐Ÿ”Ž Similar Papers
No similar papers found.