Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Existing research is hindered by the lack of large-scale, multimodal, high-fidelity 3D motion and behavioral datasets, limiting progress in modeling complex scenarios such as single-person actions, gestures, locomotion, and multi-person dialogue or collaboration. To address this, we introduce the first large-scale dataset featuring high-precision 3D full-body and hand poses, synchronized multi-channel audio, fine-grained textual annotations, and diverse social interaction scenarios—comprising 439 participants, over 500 hours of multi-view motion capture data, and 54 million high-quality 3D motion frames. Leveraging multi-camera motion capture, speaker-separated audio, individualized audio recording, and collaborative behavior annotation, we achieve strict temporal alignment across modalities and rich semantic labeling. This dataset establishes new benchmarks for complex behavioral understanding and generation, significantly advancing research in virtual avatars, natural human–computer interaction, and computational social behavior analysis.

Technology Category

Application Category

📝 Abstract

The Codec Avatars Lab at Meta introduces Embody 3D, a multimodal dataset of 500 individual hours of 3D motion data from 439 participants collected in a multi-camera collection stage, amounting to over 54 million frames of tracked 3D motion. The dataset features a wide range of single-person motion data, including prompted motions, hand gestures, and locomotion; as well as multi-person behavioral and conversational data like discussions, conversations in different emotional states, collaborative activities, and co-living scenarios in an apartment-like space. We provide tracked human motion including hand tracking and body shape, text annotations, and a separate audio track for each participant.

Problem

Research questions and friction points this paper is trying to address.

Creating a large-scale multimodal 3D human motion dataset

Capturing diverse single-person motions and multi-person interactions

Providing tracked motion data with annotations and audio

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale multimodal 3D motion dataset

Captures individual and interactive human behaviors

Includes tracked motion with annotations and audio

🔎 Similar Papers

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions