Learning Priors of Human Motion With Vision Transformers

📅 2024-07-02
🏛️ Annual International Computer Software and Applications Conference
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of modeling human motion priors in urban scenes to support crowd flow analysis and robot co-navigation. Conventional CNNs struggle to capture long-range spatial dependencies, limiting their ability to model cross-regional movement patterns. To overcome this, we propose the first end-to-end spatiotemporal modeling framework for human motion based on Vision Transformers (ViT). Our method jointly encodes trajectory sequences and positional embeddings to explicitly represent inter-regional motion correlations via global self-attention. The architecture simultaneously predicts frequently traversed paths, velocity distributions, and stationary regions. Evaluated on standard benchmark datasets, our approach significantly outperforms CNN-based baselines across key metrics—including path prediction accuracy, velocity estimation error, and stop-region localization—demonstrating the superiority of global attention for learning motion priors. This work establishes a novel paradigm for understanding collective human behavior and enabling embodied intelligent navigation in dynamic urban environments.

Technology Category

Application Category

📝 Abstract
A clear understanding of where humans move in a scenario, their usual paths and speeds, and where they stop, is very important for different applications, such as mobility studies in urban areas or robot navigation tasks within human-populated environments. We propose in this article, a neural architecture based on Vision Transformers (ViTs) to provide this information. This solution can arguably capture spatial correlations more effectively than Convolutional Neural Networks (CNNs). In the paper, we describe the methodology and proposed neural architecture and show the experiments' results with a standard dataset. We show that the proposed ViT architecture improves the metrics compared to a method based on a CNN.
Problem

Research questions and friction points this paper is trying to address.

Pedestrian Movement Patterns
Urban Crowd Flow
Robot Navigation in Crowds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Transformers
Spatial Relationship Understanding
Crowd Movement Analysis
🔎 Similar Papers
No similar papers found.
P
Placido Falqueto
Dept. of Information Engineering and Computer Science, University of Trento, Italy
Alberto Sanfeliu
Alberto Sanfeliu
Full Professor, Universitat Politecnica de Catalunya & Institut de Robotica i Informatica Industrial
RoboticsArtificial intelligencePattern RecognitionHuman-Robot Interaction
L
Luigi Palopoli
Dept. of Information Engineering and Computer Science, University of Trento, Italy
Daniele Fontanelli
Daniele Fontanelli
Professor, University of Trento
Instrumentation and MeasurementRoboticsEstimation