SuperFace: Preference-Aligned Facial Expression Estimation Beyond Pseudo Supervision

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing methods for predicting ARKit facial expression coefficients rely on pseudo-labels that suffer from noise, bias, and missing expressions, often yielding unnatural facial animations. To address this limitation, this work proposes SuperFace, a novel framework that introduces human preference feedback into the optimization of ARKit coefficients for the first time. Starting from software-generated coefficients as initialization, SuperFace integrates differentiable rendering with preference learning to establish an end-to-end perceptual alignment pipeline. Moving beyond conventional pseudo-supervised paradigms, the method shifts the optimization objective from merely fitting numerical labels to enhancing visual realism and expressiveness. Extensive evaluations demonstrate that SuperFace significantly outperforms baselines such as Live Link Face in both expression fidelity and naturalness, validating the efficacy of preference-driven optimization for semantic facial action modeling.

📝 Abstract

Accurate facial estimation is crucial for realistic digital human animation, and ARKit blendshape coefficients offer an interpretable representation by mapping facial motions to semantic animation controls. However, learning high-quality ARKit coefficient prediction remains limited by the absence of reliable ground-truth supervision. Existing methods typically rely on capture software such as Live Link Face to provide pseudo labels, which may contain noisy activations, biased coefficient magnitudes, and missing or inaccurate facial actions. Consequently, models trained with supervised learning tend to reproduce imperfect pseudo labels rather than optimize for perceptual expression fidelity. In this paper, we propose SuperFace, a preference-driven framework that moves ARKit facial expression estimation from pseudo-label imitation toward human-aligned perceptual optimization. Instead of treating software-estimated coefficients as fixed ground truth, SuperFace uses them only as an initialization and further improves coefficient prediction through human preference feedback on rendered facial expressions. By aligning the model with perceptual judgments rather than numerical pseudo labels, SuperFace enables more visually faithful and expressive facial animation. Experiments show that SuperFace improves expression fidelity over Live Link Face supervision, demonstrating the effectiveness of preference-driven optimization for semantic facial action prediction.

Problem

Research questions and friction points this paper is trying to address.

facial expression estimation

pseudo supervision

ARKit blendshape

perceptual fidelity

digital human animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

preference-driven learning

facial expression estimation

ARKit blendshape