A Survey of Body and Face Motion: Datasets, Performance Evaluation Metrics and Generative Techniques

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This paper addresses three key challenges in speech-driven virtual human motion generation: insufficient personality expression, weak cross-modal alignment, and poor dynamic coherence. Methodologically, it integrates variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models; incorporates multimodal alignment—spanning speech-to-motion and text-to-pose—and employs diverse motion representations including keypoints, neural radiance fields (NeRF), and skeletal dynamics, under a unified evaluation protocol. Its contributions include: (1) the first holistic framework jointly modeling facial and body motion generation; (2) a novel realism-coherence-expressiveness triadic evaluation framework tailored for dyadic interaction; and (3) an open-source, standardized benchmark platform covering 100+ methods and 30+ datasets. Experimental results establish a reproducible baseline, identify six concrete future research directions, and significantly advance the practical deployment of high-fidelity, personalized, and low-latency virtual human motion synthesis.

Technology Category

Application Category

📝 Abstract

Body and face motion play an integral role in communication. They convey crucial information on the participants. Advances in generative modeling and multi-modal learning have enabled motion generation from signals such as speech, conversational context and visual cues. However, generating expressive and coherent face and body dynamics remains challenging due to the complex interplay of verbal / non-verbal cues and individual personality traits. This survey reviews body and face motion generation, covering core concepts, representations techniques, generative approaches, datasets and evaluation metrics. We highlight future directions to enhance the realism, coherence and expressiveness of avatars in dyadic settings. To the best of our knowledge, this work is the first comprehensive review to cover both body and face motion. Detailed resources are listed on https://lownish23csz0010.github.io/mogen/.

Problem

Research questions and friction points this paper is trying to address.

Surveying body and face motion generation techniques and datasets

Addressing challenges in generating expressive, coherent avatar motions

Providing a comprehensive review of generative models and evaluation metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey of body and face motion generation techniques

Covers generative modeling and multi-modal learning approaches

Reviews datasets, evaluation metrics, and future directions

🔎 Similar Papers

Aligning Human Motion Generation with Human Perceptions