🤖 AI Summary
Existing methods for generating 4D facial expression sequences struggle to simultaneously ensure temporal coherence and flexible control over sequence length. To address this challenge, this work proposes a frequency-controlled approach for 4D facial expression synthesis. The method employs a frequency-controlled LSTM to enable frame-by-frame generation of variable-length sequences and introduces a multi-level identity-aware displacement network based on cross-attention to enhance geometric detail and identity consistency. Additionally, a temporal consistency loss is incorporated to improve motion smoothness across frames. Evaluated on the CoMA and Florence4D datasets, the proposed approach achieves state-of-the-art performance, demonstrating its capability to generate high-quality 4D facial animations with controllable and flexible durations.
📝 Abstract
4D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-controlled 4D facial expression synthesizing method, FC-4DFS. Specifically, we introduce a frequency-controlled LSTM network to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length. Meanwhile, we propose a temporal coherence loss to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements. Furthermore, we designed a Multi-level Identity-Aware Displacement Network based on a cross-attention mechanism to reconstruct the 4D facial expression sequences from landmark sequences. Finally, our FC-4DFS achieves flexible and SOTA generation results of 4D facial expression sequences with different lengths on CoMA and Florence4D datasets. The code will be available on GitHub.