DyCrowd: Towards Dynamic Crowd Reconstruction from a Large-scene Video

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods reconstruct 3D crowds from static images, suffering from poor temporal consistency and limited robustness to occlusions. This paper proposes the first spatiotemporally consistent dynamic crowd reconstruction framework tailored for large-scale videos. Our approach introduces four key innovations: (1) a group-guided coarse-to-fine motion optimization strategy that leverages collective motion patterns; (2) a variational autoencoder (VAE) to model human motion priors, enhancing resilience to long-term occlusions; (3) an asynchronous motion consistency (AMC) loss that explicitly enforces inter-frame motion smoothness; and (4) segment-level optimization to improve computational efficiency and training stability. We evaluate our method on VirtualCrowd, a novel synthetic large-scale video dataset we introduce. Quantitative and qualitative results demonstrate substantial improvements over state-of-the-art methods, establishing new performance benchmarks in dynamic crowd reconstruction.

Technology Category

Application Category

📝 Abstract
3D reconstruction of dynamic crowds in large scenes has become increasingly important for applications such as city surveillance and crowd analysis. However, current works attempt to reconstruct 3D crowds from a static image, causing a lack of temporal consistency and inability to alleviate the typical impact caused by occlusions. In this paper, we propose DyCrowd, the first framework for spatio-temporally consistent 3D reconstruction of hundreds of individuals' poses, positions and shapes from a large-scene video. We design a coarse-to-fine group-guided motion optimization strategy for occlusion-robust crowd reconstruction in large scenes. To address temporal instability and severe occlusions, we further incorporate a VAE (Variational Autoencoder)-based human motion prior along with a segment-level group-guided optimization. The core of our strategy leverages collective crowd behavior to address long-term dynamic occlusions. By jointly optimizing the motion sequences of individuals with similar motion segments and combining this with the proposed Asynchronous Motion Consistency (AMC) loss, we enable high-quality unoccluded motion segments to guide the motion recovery of occluded ones, ensuring robust and plausible motion recovery even in the presence of temporal desynchronization and rhythmic inconsistencies. Additionally, in order to fill the gap of no existing well-annotated large-scene video dataset, we contribute a virtual benchmark dataset, VirtualCrowd, for evaluating dynamic crowd reconstruction from large-scene videos. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in the large-scene dynamic crowd reconstruction task. The code and dataset will be available for research purposes.
Problem

Research questions and friction points this paper is trying to address.

Reconstruct 3D dynamic crowds from large-scene videos
Address occlusion and temporal inconsistency issues
Develop a benchmark dataset for crowd reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse-to-fine group-guided motion optimization
VAE-based human motion prior integration
Asynchronous Motion Consistency loss utilization
🔎 Similar Papers
No similar papers found.
H
Hao Wen
College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
H
Hongbo Kang
College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
J
Jian Ma
College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
J
Jing Huang
College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
Y
Yuanwang Yang
College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
Haozhe Lin
Haozhe Lin
Tsinghua
Yu-Kun Lai
Yu-Kun Lai
Professor, Cardiff University
Geometric ModelingGeometry ProcessingComputer GraphicsImage ProcessingComputer Vision
K
Kun Li
College of Intelligence and Computing, Tianjin University, Tianjin 300350, China