Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research is hindered by the lack of large-scale, multimodal, high-fidelity 3D motion and behavioral datasets, limiting progress in modeling complex scenarios such as single-person actions, gestures, locomotion, and multi-person dialogue or collaboration. To address this, we introduce the first large-scale dataset featuring high-precision 3D full-body and hand poses, synchronized multi-channel audio, fine-grained textual annotations, and diverse social interaction scenarios—comprising 439 participants, over 500 hours of multi-view motion capture data, and 54 million high-quality 3D motion frames. Leveraging multi-camera motion capture, speaker-separated audio, individualized audio recording, and collaborative behavior annotation, we achieve strict temporal alignment across modalities and rich semantic labeling. This dataset establishes new benchmarks for complex behavioral understanding and generation, significantly advancing research in virtual avatars, natural human–computer interaction, and computational social behavior analysis.

Technology Category

Application Category

📝 Abstract
The Codec Avatars Lab at Meta introduces Embody 3D, a multimodal dataset of 500 individual hours of 3D motion data from 439 participants collected in a multi-camera collection stage, amounting to over 54 million frames of tracked 3D motion. The dataset features a wide range of single-person motion data, including prompted motions, hand gestures, and locomotion; as well as multi-person behavioral and conversational data like discussions, conversations in different emotional states, collaborative activities, and co-living scenarios in an apartment-like space. We provide tracked human motion including hand tracking and body shape, text annotations, and a separate audio track for each participant.
Problem

Research questions and friction points this paper is trying to address.

Creating a large-scale multimodal 3D human motion dataset
Capturing diverse single-person motions and multi-person interactions
Providing tracked motion data with annotations and audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale multimodal 3D motion dataset
Captures individual and interactive human behaviors
Includes tracked motion with annotations and audio
C
Claire McLean
Codec Avatars Lab, Meta
M
Makenzie Meendering
Codec Avatars Lab, Meta
T
Tristan Swartz
Codec Avatars Lab, Meta
O
Orri Gabbay
Codec Avatars Lab, Meta
A
Alexandra Olsen
Codec Avatars Lab, Meta
R
Rachel Jacobs
Codec Avatars Lab, Meta
N
Nicholas Rosen
Codec Avatars Lab, Meta
P
Philippe de Bree
Codec Avatars Lab, Meta
T
Tony Garcia
Codec Avatars Lab, Meta
G
Gadsden Merrill
Codec Avatars Lab, Meta
J
Jake Sandakly
Codec Avatars Lab, Meta
J
Julia Buffalini
Codec Avatars Lab, Meta
N
Neham Jain
Codec Avatars Lab, Meta
S
Steven Krenn
Codec Avatars Lab, Meta
M
Moneish Kumar
Codec Avatars Lab, Meta
D
Dejan Markovic
Codec Avatars Lab, Meta
E
Evonne Ng
Codec Avatars Lab, Meta
F
Fabian Prada
Codec Avatars Lab, Meta
A
Andrew Saba
Codec Avatars Lab, Meta
Siwei Zhang
Siwei Zhang
ETH Zurich
3D human pose estimationhuman-scene interactions
Vasu Agrawal
Vasu Agrawal
Facebook Reality Labs
T
Tim Godisart
Codec Avatars Lab, Meta
Alexander Richard
Alexander Richard
Research Scientist, Facebook Reality Labs
Audio processingNeural NetworksAction RecognitionDeep LearningOptimization
Michael Zollhoefer
Michael Zollhoefer
Director, Research Scientist, Reality Labs Research, Meta
Neural RenderingComputer VisionMachine LearningComputer Graphics