ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of jointly preserving identity consistency and enabling fine-grained expression control in AI-driven narrative character modeling. We propose a diffusion-based framework guided by blendshapes. Methodologically, we design a blendshape-guided cross-attention module that injects FLAME blendshape parameters into the diffusion process; construct an identity-consistent face foundation model, an expression-specific adapter, and a reference-image guidance module to support multimodal (image/video) training. Our approach enables novel capabilities: hierarchical expression control—from coarse macro-expressions to subtle micro-expressions—cross-sample expression transfer, and temporally smooth transitions. Quantitative and qualitative evaluations demonstrate significant improvements over state-of-the-art methods across three key metrics: identity preservation, expression fidelity, and visual naturalness.

Technology Category

Application Category

📝 Abstract
Human-centric generative models designed for AI-driven storytelling must bring together two core capabilities: identity consistency and precise control over human performance. While recent diffusion-based approaches have made significant progress in maintaining facial identity, achieving fine-grained expression control without compromising identity remains challenging. In this work, we present a diffusion-based framework that faithfully reimagines any subject under any particular facial expression. Building on an ID-consistent face foundation model, we adopt a compositional design featuring an expression cross-attention module guided by FLAME blendshape parameters for explicit control. Trained on a diverse mixture of image and video data rich in expressive variation, our adapter generalizes beyond basic emotions to subtle micro-expressions and expressive transitions, overlooked by prior works. In addition, a pluggable Reference Adapter enables expression editing in real images by transferring the appearance from a reference frame during synthesis. Extensive quantitative and qualitative evaluations show that our model outperforms existing methods in tailored and identity-consistent expression generation. Code and models can be found at https://github.com/foivospar/Arc2Face.
Problem

Research questions and friction points this paper is trying to address.

Achieving identity-consistent facial expression generation
Providing fine-grained control over subtle micro-expressions
Enabling expression editing in real images via reference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion framework with ID-consistent face foundation model
Expression cross-attention module guided by blendshape parameters
Pluggable Reference Adapter enables expression editing in real images
🔎 Similar Papers
No similar papers found.