Gen4D: Synthesizing Humans and Scenes in the Wild

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world sports scene data is scarce, while synthetic data suffers from limited diversity and low fidelity. Method: We propose SportPAL, the first fully automated 4D human animation generation framework. It integrates expert motion encoding, prompt-driven diffusion-based Gaussian splatting for portrait synthesis, and human-aware background co-synthesis—eliminating reliance on manual modeling and fixed asset libraries. Leveraging kinematics-guided generation and joint scene-human optimization, SportPAL achieves high-fidelity, high-diversity dynamic human–environment co-synthesis. Contribution/Results: Based on SportPAL, we construct a large-scale synthetic dataset covering baseball, ice hockey, and soccer. Evaluated on in-the-wild human behavior understanding tasks, models trained on SportPAL data demonstrate significant performance gains. The framework enables zero-manual-effort 3D modeling for synthetic data production, advancing scalable, realistic sports scene generation.

Technology Category

Application Category

📝 Abstract
Lack of input data for in-the-wild activities often results in low performance across various computer vision tasks. This challenge is particularly pronounced in uncommon human-centric domains like sports, where real-world data collection is complex and impractical. While synthetic datasets offer a promising alternative, existing approaches typically suffer from limited diversity in human appearance, motion, and scene composition due to their reliance on rigid asset libraries and hand-crafted rendering pipelines. To address this, we introduce Gen4D, a fully automated pipeline for generating diverse and photorealistic 4D human animations. Gen4D integrates expert-driven motion encoding, prompt-guided avatar generation using diffusion-based Gaussian splatting, and human-aware background synthesis to produce highly varied and lifelike human sequences. Based on Gen4D, we present SportPAL, a large-scale synthetic dataset spanning three sports: baseball, icehockey, and soccer. Together, Gen4D and SportPAL provide a scalable foundation for constructing synthetic datasets tailored to in-the-wild human-centric vision tasks, with no need for manual 3D modeling or scene design.
Problem

Research questions and friction points this paper is trying to address.

Lack of diverse in-the-wild human activity data
Limited synthetic dataset diversity in appearance and motion
Challenges in manual 3D modeling for human-centric scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for diverse 4D human animations
Diffusion-based Gaussian splatting for avatar generation
Human-aware background synthesis for lifelike sequences
🔎 Similar Papers
No similar papers found.
Jerrin Bright
Jerrin Bright
University of Waterloo
3D Human ModelingComputer VisionAutonomous Navigation
Z
Zhibo Wang
Vision and Image Processing Lab, University of Waterloo, Canada
Y
Yuhao Chen
Vision and Image Processing Lab, University of Waterloo, Canada
Sirisha Rambhatla
Sirisha Rambhatla
Assistant Professor at the University of Waterloo
Machine LearningStatistical Signal ProcessingOptimizationAI for Healthcare
J
John Zelek
Vision and Image Processing Lab, University of Waterloo, Canada
D
David Clausi
Vision and Image Processing Lab, University of Waterloo, Canada