🤖 AI Summary
This study addresses the inefficiency and poor user experience of traditional facial data collection, which typically relies on large sample sizes. The authors propose a two-stage streamlined acquisition strategy leveraging voice and affective data to generate virtual avatars: first, speech-driven animation is guided by emotional labels, and second, user feedback informs multi-condition comparative experiments. Evaluations with 24 participants demonstrate that the proposed method achieves comparable levels of realism, naturalness, and presence to full-data approaches, while substantially reducing data requirements and training costs. These findings validate the feasibility of reconstructing high-perceptual-quality virtual avatars under low-data conditions.
📝 Abstract
This study explores a streamlined facial data collection method for conversational contexts, addressing the limitations of existing approaches that often require extensive datasets and prioritize technical metrics over user perception and experience. We systematically investigate which facial expression data are essential for reconstructing photorealistic avatars and how they can be captured efficiently. Our research employs a two-phase methodology to identify efficient facial data collection strategies and evaluate their effectiveness. In the first phase, we conduct facial data acquisition and evaluate reconstruction performance using utterance data and emotional data. In the second phase, we carry out a comprehensive user evaluation comparing three progressive conditions: utterance only, utterance and emotional data, and a control condition involving extensive data. Findings from 24 participants engaged in simulated face-to-face conversations reveal that targeted utterance and emotional data achieve comparable levels of perceived realism, naturalness, and telepresence, while reducing training time and data usage when compared to the extensive data collection approach. These results demonstrate that targeted data inputs can enable efficient avatar face reconstruction, offering practical guidelines for real-time applications such as AR/VR telepresence and highlighting the trade-off between data quantity and perceived quality.