🤖 AI Summary
This work addresses the challenge of fine-grained facial expression editing, which is hindered by semantic overlap among expressions and the difficulty of achieving precise, continuous, and identity-preserving control. To this end, the authors introduce the first Facial Expression Editing (FFE) dataset with continuous affective annotations, along with a comprehensive evaluation benchmark termed FFE-Bench. They further propose PixelSmile, a diffusion-based framework that disentangles expression semantics through fully symmetric joint training, integrating intensity supervision and contrastive learning to enable linearly interpolable expression control within the text latent space. PixelSmile is the first method to support smooth, fine-grained, and continuously controllable expression editing, significantly outperforming existing approaches in structural clarity, editing accuracy, linearity of control, and identity preservation.
📝 Abstract
Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off between expression editing and identity preservation. We propose PixelSmile, a diffusion framework that disentangles expression semantics via fully symmetric joint training. PixelSmile combines intensity supervision with contrastive learning to produce stronger and more distinguishable expressions, achieving precise and stable linear expression control through textual latent interpolation. Extensive experiments demonstrate that PixelSmile achieves superior disentanglement and robust identity preservation, confirming its effectiveness for continuous, controllable, and fine-grained expression editing, while naturally supporting smooth expression blending.