HVIS: A Human-like Vision and Inference System for Human Motion Prediction

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

To address the challenges of modeling spatiotemporal dependencies and fusing multiscale features in human motion prediction, this paper proposes a brain-inspired dual-module framework: (1) Retina-visual-cortex协同 encoding—introducing a novel spatiotemporal-decoupled retinomorphic encoding scheme coupled with multiscale visual-cortex–inspired feature extraction; and (2) Spontaneous–deliberate collaborative learning—incorporating an adversarial spontaneous generative network grounded in neuronal dropout mechanisms, jointly optimized with a deliberate learning module targeting hard-to-train joints. Evaluated on Human3.6M, CMU MoCap, and G3D benchmarks, our method achieves MPJPE improvements of 19.8%, 15.7%, and 11.1% over state-of-the-art methods, respectively. It significantly enhances long-horizon prediction accuracy and cross-dataset generalization, establishing a new paradigm for brain-inspired human motion modeling.

Technology Category

Application Category

📝 Abstract

Grasping the intricacies of human motion, which involve perceiving spatio-temporal dependence and multi-scale effects, is essential for predicting human motion. While humans inherently possess the requisite skills to navigate this issue, it proves to be markedly more challenging for machines to emulate. To bridge the gap, we propose the Human-like Vision and Inference System (HVIS) for human motion prediction, which is designed to emulate human observation and forecast future movements. HVIS comprises two components: the human-like vision encode (HVE) module and the human-like motion inference (HMI) module. The HVE module mimics and refines the human visual process, incorporating a retina-analog component that captures spatiotemporal information separately to avoid unnecessary crosstalk. Additionally, a visual cortex-analogy component is designed to hierarchically extract and treat complex motion features, focusing on both global and local features of human poses. The HMI is employed to simulate the multi-stage learning model of the human brain. The spontaneous learning network simulates the neuronal fracture generation process for the adversarial generation of future motions. Subsequently, the deliberate learning network is optimized for hard-to-train joints to prevent misleading learning. Experimental results demonstrate that our method achieves new state-of-the-art performance, significantly outperforming existing methods by 19.8% on Human3.6M, 15.7% on CMU Mocap, and 11.1% on G3D.

Problem

Research questions and friction points this paper is trying to address.

Develops a system for human motion prediction.

Emulates human vision and inference processes.

Outperforms existing methods in motion prediction accuracy.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Emulates human visual process

Hierarchically extracts motion features

Simulates multi-stage brain learning

🔎 Similar Papers

No similar papers found.