🤖 AI Summary
Existing facial expression generation datasets are constrained by speech-driven paradigms or coarse-grained emotion labels, lacking fine-grained semantic descriptions and relying on costly motion-capture systems. To address these limitations, we introduce the first lightweight benchmark for text-driven 4D facial animation: a high-fidelity dataset captured using consumer-grade RGB-D sensors, parameterized via ARKit blendshapes, and augmented with rich, nuanced natural language instructions automatically generated by large language models. This enables many-to-many text-to-motion mappings. Building upon this resource, we train and evaluate a text-to-facial-motion generation model, demonstrating significant improvements in both semantic fidelity—accurately interpreting diverse linguistic descriptions—and expressive diversity. The dataset, training code, and pretrained models are fully open-sourced to foster reproducible research and community advancement.
📝 Abstract
Dynamic facial expression generation from natural language is a crucial task in Computer Graphics, with applications in Animation, Virtual Avatars, and Human-Computer Interaction. However, current generative models suffer from datasets that are either speech-driven or limited to coarse emotion labels, lacking the nuanced, expressive descriptions needed for fine-grained control, and were captured using elaborate and expensive equipment. We hence present a new dataset of facial motion sequences featuring nuanced performances and semantic annotation. The data is easily collected using commodity equipment and LLM-generated natural language instructions, in the popular ARKit blendshape format. This provides riggable motion, rich with expressive performances and labels. We accordingly train two baseline models, and evaluate their performance for future benchmarking. Using our Express4D dataset, the trained models can learn meaningful text-to-expression motion generation and capture the many-to-many mapping of the two modalities. The dataset, code, and video examples are available on our webpage: https://jaron1990.github.io/Express4D/